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Approaches to research 


Stuart Hannabuss 


School of Librarianship and Information Studies, The Robert Gordon University, 


352 King Street, Aberdeen AB9 2TQ 


Abstract 


There is an increasing emphasis on research in library and information studies. This has led to a proliferation of 
courses on research methods. For people starting research, as well as for teachers organising such courses, the 
experience has been exciting and complex. Not only has it involved the identification and development of eclectic 
research ideas, but it has also led to a deeper examination of the relationship between theory and practice. Critical 
too has been the interface between research which leads to financial reward and research conducted for purely 
academic reasons (and how and where the area lies in between). Political factors have also been at work, from the 
perceived need new entrants to the profession have of the desirability of a master's degree, demonstrating mastery 
of at least fundamental researching skills, to the momentum in higher education to provide a wide range of 
challenging courses which purport to ensure competitive advantage for their graduates in the market-place. 


complex theoretical and political hinterland, which 
needs to be carried out systematically, explained 
clearly, and evaluated convincingly. 

The research process itself is often represented as 
a series of stages or steps. Starting research is where 
the need or opportunity for research is identified and 
where research ideas or questions begin to emerge 
and gradually become crystallized. The characteris- 
tic next step is a review of the literature, to determine 
what has been written and done already on the sub- 
ject, and what findings already exist. It may be that 
the research ideas or questions emerge out of the 
review. They often arise from a variety of sources - 
literature, immediate or prospective problems in the 
workplace, a perceived need for evidence to support 
a management decision, research a sponsor wants 
carried out, a consultancy. After the literature review 
comes specifying the research problem, which in- 
volves representing what you want to know in words 
and in the form of a viable series of tasks. Then 
comes the study itself, followed by a review of the 
implications and outcomes, with possible recom- 
mendations. Implicit in the rationale throughout 1s 
why the study was conducted in the way that it was, 
and what it contributes (to known knowledge or to 
solving a particular problem). 

Specialized terminology is often used of the 
stages, such as justification of why the research 
question is important, empirical work for the experi- 
ments or surveys which form the study, analysis and 
critique for the systematic evaluation ofthe findings, 
and discussion for a review of the methods used and 
speculations about the inferences and implications 
of the research. The literature review consists not 
merely of what is written but also key ideas and 
approaches of conceptual and historiographic value 
in their own right, and may more correctly be called 
a theoretical framework. This in its turn may lead the 
researcher to investigate the intellectual or ideologi- 
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The research process 
Research, pure and applied, is carried out for many 
reasons. In the social sciences, where arguably most 
library and information research 1s based, it is in part 
a search for knowledge for its own sake. Examples 
include finding out more about what people think, 
how they behave (say, in information-gathering), 
and the extent to which information-awareness influ- 
ences decision-making. More and more today, the 
search for knowledge has a pragmatic or utilitarian 
purpose, since managers and sponsors usually want 
to know about the implications for practice, for the 
organization or for society. In its study “Target 2000: 
projecting British social science’, the Association of 
Learned Societies in the Social Sciences have identi- 
fied applications such as the study of work and 
organizations, political processes, rural and urban 
change, and human resource issues, as being appro- 
priate targets for such applied research. Certainly, 
in-house research in library and information services 
usually has that practical focus and rationale, since 
research is expensive in time and intellectual energy, 
and must be justified in terms of effective outcomes. 
Not unimportant in research is the identification 
and use of research methods (or, when defined more 
systematically, methodologies). These methods can 
be classified as quantitative or qualitative, and both 
in their various ways attempt to describe and explain 
the social phenomena under review. These phenom- 
ena may be user behaviours or attitudes, relationships 
between variables like the age and use of documents, 
the representativeness of sample evidence, and sig- 
nificant exceptions revealed through detailed 
investigation of costs or prices or delivery lead- 
times. In their turn, the choice and implementation of 
research methods impose intellectual and practical 
constraints on the researcher, not least of which is the 
extent to which the validity of the method can be 
explained. Research, then, is a process set within a 


or cash-flow models intc a study of budgeting prac- 
tice in a large library). It may be an abstraction in 
search of proof, such as a view of human motivation 
or adaptation to change. 


The research proposal 

Two important trends in library and information 
research have been the need to communicate the 
research plan clearly and persuasively, and the inten- 
tion to attract financial support. 

Even when students in universities are engaged 
in putting together statements and outlines for course- 
assessed work or higher dzgrees, it is critical to learn 
the skill of communicating the content and approach 
of -he research in a convincing and attractive way. 
Skills learned early on may, through continuing 
professional development, grow into mature research, 
development, and consultancy skills. The vehicle for 
coramunicating the claims and scope of intended 
research is the research proposal. 

This may be concise or extended, but whatever 
else it does it should provide a clear idea of the 
research problem (i.e. the goal of the project). This 
mar indeed be a problem (like finding ways of 
building quality into information provision), or an 
issue (like the implications of income-generation or 
affirmative action), or a critical focus (like current 
library practice in summarizing and displaying man- 
agement information in regular reports). It is often 
difficult to identify what the importance of the pro- 
posed study is without special pleading. Such claims 
alweys need to be made krowing current and retro- 
spective literature on the subject, and demonstrating 
familiarity with available and appropriate research 
methods. A major role for the proposal is to make 
clear what methods will be used and how: the suit- 
abilry of the methods, and the ways in which they 
are implemented, will critically affect the outcomes 
and cogency of the research. For instance, surveys 
are useful for behavioural research like user studies, 
and may involve deliberate use of questionnaires, 
samples, and statistical analysis. On the other hand, 
more humanistic research may entail extended un- 
structured interviews with junior managers where 
selection of respondents, in-erpretation of data, and 
generalizability will be very different. 

Since the proposal is a distillation of the puta- 
tively completed research project or programme, it 
is, paradoxically enough, both an indication of what 
will b2 and what will have been. This means that an 
effect ve proposal has succeeded, in advance of com- 
pleting the actual research, in demonstrating a 
conviacingly thorough appreciation of the problems, 
constraints, opportunities, and outcomes of the re- 
search. Not only is a good research proposal able to 
project ahead in this way: it is also able to show a 
strongly self-critical and reflective emphasis. This 
factor reveals itself in many ways, from the extent to 
which validity and reliability are built into the design 
to the ways in which the researcher is able to argue 
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cal paradigm within which the research is to be 
carried out (examples of such paradigms are the 
capitalist information market-place and the ways in 
which, say, economists think about economics and 
create and disseminate information within that sub- 
ject domain). More specialized still are stages of the 
research process where research propositions are 
formulated, hypotheses framed, and a formal re- 
search design created. 

The early stages of the process often consist of 
considering research ideas, expressing them as state- 
ments, building them up into an outline, and 
developing the formal research design. This entails 
moving from questions like ‘What do I want to 
know?’ and ‘How can it be investigated?’ to the more 
systematic position of outlining exactly what is to be 
studied, what sources and methods used, and what 
outcomes are likely. Research questions should be 
clear and precise: e.g. ‘What information sources are 
most used by middle managers in the oil industry?’. 
Often there are difficulties of definition, as when the 
question asks whether a particular form of informa- 
tion provision has been ‘effective’, and ‘effective’ 
itself has to be defined further. In such cases it is 
essential to consider how concepts like effectiveness 
can be measured so that a viable and credible re- 
search design can evolve. The research design itself 
subsumes the research statements and questions, pro- 
vides the framework (eg theory/practice, literature 
review, other research), identifies claims or hypo- 
theses, defines methodologies and ways of gathering 
and analysing data, the working schedule of the 
research, and intended findings and contribution to 
the world of knowledge and/or managerial decision- 
making. The process, then, can be characterized as a 
movement towards greater understanding, defini- 
tion, and complexity. An important aspect of it is 
manageability in terms of the time, cost, resources, 
and researcher ability and interest. 

The role of theory is always complex. Theory 
may provide the starting point for research; for ex- 
ample, a view of society or of individual behaviour 
may inform a piece of research on the provision of 
information products and services to particular user 
groups. Theories have been called intellectual sys- 
tems looking for empirical reference, that is to say, 
eligible to research which tests whether their claims 
have any basis in reality. Examples include theories 
about the way people learn and the provision of 
training or information skills, and about the ways in 
which scientific knowledge develops and the ways in 
which this is reflected in the literature. At times, 
theory expresses itself in the form of models, like 
models of information exchange and flow, which 
may then be used as important formulations of the 
ways variables interact and as the basis for practical 
research into particular situations. Theory may be 
imported from one discipline into another (e.g. the 
value-chains of competitive advantage marketing 
into an examination of ‘the intelligent organization’, 
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as when theoretical models of organizational design 
or human behaviour are used as starting points for 
the investigation. The work-place emphasis and the 
direct application of the research to decision-making 
often lead people to call applied research 8 
research'. 

Research may entail the examination of histori- 
cal evidence, financial and otherwise, with the 
intention of determining particular facts or trends, 
investigating causes and identifying important cy- 
cles. Such historical research may be applied to 
management situations (such as policy changes in 
higher education and implications for library provi- 
sion) or to bibliographical areas and their readership 
and sociology (such as historiographic studies, lit- 
eracy and cultural research, and the developmert of 
academic communities and disciplines). It may -ake 
on a bibliometric dimension if statistical analys.s is 
applied to the content of journals or the publication 
patterns of monographs, and content analysis may be 
incorporated in such research if citations or concepts 
or particular themes are investigated for the fre- 
quency and impact of their appearance in the 
literature. 

One familiar distinction is that of quantitative 
and qualitative research, the first with its empkasis 
on measurement and testing and the second with its 
emphasis on understanding participants and factors 
in context. Quantitative research in the social sci- 
ences is often called ‘scientific’ because it aften 
formulates research problems in the form of testable 
hypotheses, attempts to identify and measure »ela- 
tionships between variables, and strives to minimise 
researcher interference. This research is called 
'hypothetico-deductive' because it uses hypotheses 
or formal claims and sets out to test or prove them, 
and uses an approach to evidence which examines 
how far and well observations confirm the iritial 
‘thesis’ or ‘claim’ or ‘law’. Indeed, hypotheses are 
often regarded as propositions that lead to the pre- 
diction of facts under given circumstances. 
Researchers attempt to ensure that the circumstances 
are as controlled as possible, although the strict 
laboratory controls characteristic of chemical or en- 
gineering experiments are not always possible or 


appropriate in the social sciences. 


Quantitative research often identifies and exam- 
ines variables. Examples of these include the 
frequency of journal issues, prices of chemistry mono- 
graphs, absenteeism of junior staff, and visits to the 
library of elderly users. Hypothetical relationships 
between data can be determined, e.g. the exteat to 
which job satisfaction, however measured, reduces 
absenteeism among staff. Systematic methods of 
data collection and analysis are used, often based on 
statistical techniques like sampling and significance 
testing. Such approaches are suitable where data fall 
plausibly into natural and convincing categcries, 
like usage and costing figures, and where correla- 
tions and causal relationships and group differences 


objectively that the research has overcome practica] 
problems and has made realistic and tenable compro- 
mises over its limitations. 

In addition, the research proposal should give a 
clear indication of how the data or findings will be 
interpreted or analysed. Data may be numerical, like 
statistics or costings, or textual, like statements re- 
corded on tape in interviews. In both cases, however, 
it is essential to indicate how and why the data have 
been evaluated, and the relationship of the findings 
(say, particular causal relationships between vari- 
ables, particular patterns in the qualitative 
information) with the original claims of the research. 
The original claims may find corroboration from the 
findings, or may undergo qualification as a result of 
them. 

The proposal should point, however tentatively, 
towards the expected results or, more generally, the 
likely impact of the research. There is therefore a 
clear link between the aims and outcomes of the 
projected research, and a satisfactory coberence in 
the proposal documentation. 

Often underplayed in research proposals are im- 
portant practical elements like resources, staff, budget, 
and agenda. Resources, simply, are what the institu- 
tion can provide by way of computer facilities and 
libraries. The staff may be the researcher or research 
assistants, as well as an indication of who else 
(organization contacts, respondents) may be involved. 
Budget costs resources and staff, estimates likely 
expenditure on travel and equipment, and dissemina- 
tion of findings. Costings are particularly important 
when seeking sponsored research (say, from the 
British Library), and are often divided into catego- 
ries like recurrent and non-recurrent. Agenda deals 
with the context within which the research is planned 
and takes place: this is often political, with competi- 
tive interest groups and opportunistic collaborations. 
In organizational research, it is often characterized 
by tensions between academic and workplace re- 
search (in terms of expectations) and by the diplomatic 
and ethical dilemmas posed when conducting re- 
search in situations where stakeholders subscribe to 
different agendas. Agendas are further complicated 
by issues of confidentiality and anonymity, by how 
far research encounters are overt rather than covert, 
and how far research findings are massaged into 
acceptable PR at the end of the day. 


Types of research 

There are many types of research. In library and 
information studies applied research, as opposed to 
pure research, is very popular. Applied research is 
pragmatic and stresses the importance of gathering 
and analysing information which can be used in 
resolving real-life problems. Examples include the 
in-house research a library might conduct into is- 
sues, duplication, opening hours, efficiency of 
departments, and so on. There may be pure or ‘basic’ 
research components built into such applied research, 
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Gummesson (1991) summarizes some of the 
major differences between these two approaches, 
calling them positivistic and hermeneutic. Positiv- 
ism stresses rules and norms by which we can explore 
and explain phenomena objectively, and defines valid 
knowledge and inquiry in scientific terms. The 
hermeneutic approach emphasizes understanding, 
perception, idiosyncrasy, and the participants’ own 
ways of making sense of experience, and has other 
names like humanistic, naturalistic, illuminative, 
exploratory, and qualitative. For Gummesson posi- 
tivism concentrates on description and explanation, 
on well-defined studies, on explicit theories and 


hypotheses, clear distinctions between facts and val- 


ues, rationality and logic, statistical techniques and 
detachment. The hermeneutic approach, on the other 
hand, works on different principles — understanding, 
above all understanding the social work from the 
point of view ofthe actor (e.g. the middle manager in 
the organization), holistic studies, a recognition that 
there is no black-and-white distinction between facts 
and values, non-quantitative data, and acceptance of 
researcher involvement and perception. 

Often in social science research there are mixed 
modes, drawing on both traditions. For instance, it 
would be wrong to say that objectivity is not impor- 
tant for qualitative research, and that empirical ‘facts’ 
are not essential for a suczessful analysis of a man- 
agement situation, but at the same time frequently 
social science research, say into organizations, takes 
account ofthe subjective constructions ofthe partici- 
pants (e.g. the meanings they give to the phenomena 
they might define in an interview or a questionnaire), 
as well as of the political or ideological dimensions 
of any interpretation of human behaviour (e.g. with 
reference to capitalism or feminism). Often, survey 
work entails such mixed approaches, as samples are 
identified and investigated with attention to statisti- 
cal representativeness, but evidence (say in the form 
of extended text from open questions or follow-up 
interviews) is analysed thematically in ways inap- 
propriate for significance testing, and interpreted 
with an emphasis on the private constructions par- 
ticipants place on meanings. 


Qualitative research 

Qualitative research stresses *understanding', em- 
phasizes context, sees the social world from the point 
of view of the actor, human behaviour from the 
actor's own frame of reference. As Bogdan and 
Taylor (1975) say, the positivistic approach empha- 
sizes facts or causes of social phenomena rather than 
subjective states of the individual. Mellon (1990) 
calls this type of research ‘naturalistic’, an in-depth 
study of people and situations and events, challeng- 
ing researchers to establish rapport with the situation, 
maintaining objectivity with alienating informants, 
and perceiving theory from a welter of fascinating 
and conflicting information. She states that natural- 
istic studies focus on viewing experiences from the 
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are critical to a full understanding of the phenomena 
concerned. Differences which emerge between ob- 
served and expected data present the researcher with 
issues of statististical significance. This should be 
built into the research design from the start, with 
appropriate tests. For example, the hypothesis that 
'there is no significant relationship between the 
number of reviews a book receives and the inclusion 
of that book in the library’ may, after appropriate 
data have been gathered, and statistical tests carried 
out, receive confirmation or not (i.e. accepting the 
null hypothesis or not). Testing by parametric (e.g. t 
or F tests) or non-parametric (e.g. chi-square) meth- 
ods may follow naturally. Important also in this 
situation are the instruments used to measure the data 
(e.g. survey) and the extent to which the scales and 
criteria used for measurement are valid (i.e. suited to 
the task) and reliable (i.e. replicable). 

The design of this is often called ‘experimental’ 
because it resembles scientific experimentation. The 
purpose is often to test the hypotheses or educated 
guesses (e.g. that the use of CD-ROM in libraries 
increases the number of requests for inter-library 
loans) that have arisen within the theoretical frame- 
work of ideas and methodologies and from 
experiential knowledge of the workplace. The con- 
ventional design for experimental research is, most 
simply, to elicit responses from or identify behav- 
iours of participants when exposed to particular 
factors. Examples might include the ways users use a 
catalogue after instruction or a television audience 
think about an issue after watching a programme 
about it. More complex are pretest-posttest designs 
in which the researcher measures before and after 
exposure, making sure to control the susceptibility to 
exposure of those taking part. Classically, a compari- 
son can be made between one group which is exposed 
(the ‘experimental’ group) (say, to user education) 
and another group which is not (the so-called ‘con- 
trol’ group), enabling the researcher to investigate 
whether the exposure does indeed have the assumed 
effect. 

Notall situations lend themselves to this research 
design, particularly if the variables involved are not 
easy to measure in this manner. There may also be 
factors at work in the situation which make it inap- 
propriate to characterize it as one where independent 
variables influence dependent variables (a tradition- 
ally effective way to test the causality of hypotheses). 
Moreover, such research assumes that social facts 
have an objective reality rather than being socially 
constructed, and that variables can be objectively 
identified and measured rather than being complex, 
woven into the fabric of meanings and perceptions 
participants have and share, and so difficult to 
elicit. The research hinterland is that of a contrast 
between two paradigms, the hypothetical-deduc- 
tive paradigm, with its quantitative approach, and 
the holistic-inductive paradigm, described and 
discussed below. 
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evidence. A major decision is how far to struc-ure 
interviews (e.g. by schedules), particularly 1f this 
predetermines the outcomes. Yet interviews also 
provide a wealth of ethnographic information: "what 
respondents think about the facts, how they put their 
ideas and. feelings into words, how they present 
themselves to the researcher during the interview. 
With participant observation the researcher becomes 
involved in the lives (e.g. workplace, home lives) of 
the respondents (e.g. visiting an information service 
to observe staff or users), an approach which has the 
advantage of close-hand experience and the disad- 
vantage of influencing what goes on. For irstance, 
observation of staff, particularly if unobtrus.ve (i.e. 
they do not know you are there or who you zre), can 
lead to shyness or suspicion or fears about confiden- 
tiality. The very issues wheré probing in-d2pth has 
greatest potential, e.g. into job satisfaction at work, 
are in times of recession, performance and appraisal, 
most sensitive to investigate. It is sometimes hard to 
convince respondents that the research is uncon- 
nected with any formal evaluation from line 
management. 
After defining qualitative research and briefly 
describing its major methods, the ways ir. which the 
information or data are analysed is distinctive. Re- 
searchers often find themselves overwhelmed by 
information and so there is a need to impose order. 
Transcripts from field notes or interviews (especially 
tape-recorded interviews) can be extensive. A char- 
acteristic approach is to identify important and 
recurring themes (say, activities concerned with user 
service, expressed attitudes about low morale, focal 
areas of human interaction like team Cynamics, be- 
liefs aboutthe organizational culture). Themes should 
be organized into coherent and linkec structures so 
that they hold together in relation tc the world of 
knowledge and practice they are intemded to illumi- 
nate and have a convincing mutual relevance. 
Frequently the evidence is then arranged into 
patterns — say, views expressed by inzormation users 
about aspects of the service, behaviours revealed in 
response to particular management scyles, the effects 
of training provision on staff self-esteem, situations 
consensually identified as stressors or pressure points, 
evidence of where people agree or disagree about 
norms. It may be important to accommodate into this 
scheme divergent and contradictory evidence. There 
are times when, in interviews, the respondent says 
one thing but conveys an opposite meaning through 
tone or body language. Many researchers devise a 
coding system so that particular themes or reactions 
can be consistently recorded and f agged and brought 
together more reliably and speecily during the data 
analysis stage. Such a system may pick up ideas or 


: concepts, beliefs or reactions, tone or unconscious 


gesture, Inter-relationships between themes, patterns, 
and coded categories should evolve, building up an 
understanding of the situation. :t is this understand- 
ing of the situation which the researcher hopes to 


perspective of those involved — librarians, informa- 
. tion professionals, information users. The roots of 
this approach lie in ethnography (the study of human 
behaviour in society) and anthropology.. Research 
into information is ethnographic, for instance, if it 
examines the ways in which individual users search 
for information, or the extent to which they find it 
meaningful and relevant to their situation, or how 
they represent their information-gathering or their 
valuation of relevance. 

Such naturalistic or qualitative research is con- 
ducted in ways rather different from those which 
characterize the experimental or quantitative 
approach. Often research starts small and moves to 
and fro as the researcher defines and redefines the 
area and problem to be considered. Expectations and 
assumptions may have to be reviewed and removed. 
Rather than a hypothesis as starting point, such re- 
search often involves familiarization with the people 
and situation. This is what Bourdieu termed the 
‘habitus’, the social setting of the actor, his or her 
way of talking and doing, the inside-head meanings 
and representations in language. It is ‘a shared body 
of dispositions, classifications and schemes, not just 
~ cumulative history but the source of objective prac- 
tices and their subjective generative principles (eg 
how we explain things)’ (Jenkins, 1992). The habitus 
is most fully understood within its ‘field’, that struc- 
tured system of social positions, for individuals and 
institutions, which defines the situation for their 
occupants. Fields include education, class, political 
structures, gender, and organizational cultures. They 
draw on the systematic knowledge and belief struc- 
tures which are usually called paradigms and 
ideologies, and form critically important aspects of 
the context within which qualitative research is car- 
ried out. 

The methods used in such research include 
documentary study, interviews, and participant 
observation. These are ways of eliciting information 
at first hand and in-depth from respondents or in- 
formants. Documentary study or the analysis of 
personal documents (e.g. letters) can be quantitative 
(e.g. the content analysis of newspapers) but often 
takes account of the ideological assumptions in the 
documents (e.g. assumptions about deviancy in offi- 
cial papers or about the decision-making process in 
committee minutes). Effective analysis requires of 
the researcher a fine conceptual grasp of problems, 
an ability to make one's own interpretations in the 
context of the many already there, and an awareness 
of the assumptions being made. 

Interviews are widely used to elicit not only 
information about respondents but also what they 
think of an issue or situation. They are useful mecha- 
nisms for probing not just behaviour and experience 
but also opinions, values, beliefs, and feelings. 
Positivistic approaches can be used in interviewing, 
as when facts about the world are being sought or 
when variables are being developed and clothed with 
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values. The assumption that facts are truly objective 
is often difficult to prove in the social sciences where 
facts are expressed in language which itself is so- 
cial'y constructed and value-laden. Much of the 


` discourse of professional life is impregnated with 


conrotations indicative of these values. Discussion 
of a marketing approach to library service or ‘service 
values’ in a fee-based information economy reveals 
widespread use of these connotations. Analysts of 
communication and media language refer to gender 
and class and politics as fertile arenas for ideological 


` meanings. In qualitative research, therefore, which 


places emphasis on the meanings which respondents 
give tothe information they provide to researchers, it 
is impcrtant to take these factors on board. Resear- 
chers can identify such assumptive meanings in the 
documentary evidence they examine, and in inter- 
view and interpersonal discourse in the workplace. 
Associated with the issue cf facts and values is 
that of ‘conscious partiality’. Working from the posi- 
tion that research may not be value-free , and that the 
evidence obtained may be ideologically biassed and 
compatib.e with a context in which social conflict is 
normal ard where respondents are likely to reveal 
(knowingky or not) a ‘false consciousness’, con- 
sciously pzrtial research will actively adopta position 
of advocacy. Typical of such research is some of the 
work in ferainism and environmental studies. It can 
be seen in library and information studies in research 
on human resource management, gender and the 
‘glass ceilirg' effect on female employees, and in 
research on environmental and community informa- 
tion where acvocacy and advice, politically informed 


` and otherwis2, may be considered appropriate. More 


broadly, it is useful for researchers to consider the 
broader historical, sociological, and political approa- 
ches to information provision, when investigating 
subjects like concentration in the communication 
industry and tke extent to which hegemonistic elites, 
nationally and internationally, may arguably exer- 
cise excessive ‘nfluence. 


Research 5 

Research experience is usually acquired in piece- 

meal ways, say by taking part in survey cr observation 

work, or conducting a documentary study as part of 
a discipline-based degree. Increasing attention 5 

being given to the range of techniques available for 

both academic ar practitioner-based research. Rea- 

sons for this range from the proliferation of research 

methods courses in higher education to the 0۰ 
pressure to be seea to be managing at quality levels 

in library and information services. Consultancy and 

the infrastructure of funded research have also ac- 

quired a higher prcfile in recent years. 

In consequence it is useful regularly to review 
what research techriques are available, and examine 
the literature — articles, theses, reports — to see what 
is being carried out. Particular fields, like سن‎ 

ment information aad decision support systems, an 
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communicate to the reader of the research at the end 
of the day. 

The emphasis on getting to know both the actors 
and the context, both the facts and the values of a 
situation, has led people to call this kind of research 
‘holistic’. It is also called ‘inductive’. At its simplest, 
this means that one proceeds from example to gener- 
alization (rather than the other way round which is 
deduction). In naturalistic research, it is a familiar 
approach to start with an idea of the research prob- 
lem, perhaps a rough model of interrelationships 
between factors (say, between capitalism and the 
deregulated information marketplace, or between 
the availability of electronic information and the 
kinds of topics writers in an academic discipline 
most often write about), build up evidence of this 
problem or model, deconstruct and redefine the model, 
clarify the issues, excise some research avenues and 
introduce others. The process is often recursive, as 
the growing body of evidence (i.e. exemplary mate- 
rial) provides increasing insight into the nature ofthe 
problem and the exact manner in which it asks to be 
researched. Considerable self-discipline is needed to 
keep the original research claims in view, particu- 
larly when evidence builds up, subject boundaries 
get blurred, and the plausibility of the original model 
gets thrown into doubt. Even when the inductive 
process moves along smoothly, it is rare in the social 
sciences that one is seeking to confirm a ‘law’ in the 
scientific sense. 

This inductive process is, in effect, theory gen- 
eration. Ideas and themes and patterns are after all 
identified and organized so that a broader, over- 
arching theoretical proposition can be suggested. 
This may be macro, in the sense that it applies to 
society or a professional or subject domain as a 
whole, or micro, in that it refers mainly to one 
organization or type of organization or situation. 
One of the challenges with qualitative research is the 
extent to which one can generalize from it. By its 
nature it captures the idiosyncratic and richly tex- 
tured nature of particular situations or respondent 
group (most typically in case study research), mak- 
ing it difficult to generalize from and replicate. It is 
for this reason that qualitative researchers have found 
terms other than generalizability: 'fittingness' is one, 
conveying the degree to which the situation studied 
matches other situations in which researchers might 
be interested, and offers a realistic and feasible ap- 
proach to research in that domain. More detailed is 
the idea of ‘comparability’ or ‘translatability’, which 
focuses on the extent to which components of a 
research study (like the concepts, recurring features 
of the setting, or dominant characteristics of re- 
spondents) can be regarded as well and fully enough 
defined and discussed to enable other researchers to 
use the results, or extrapolate from them, in further 
studies carried out in comparable ways and areas. 
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reliability. On the most general level this has close 
bearings on how objectively or subjectively the re- 
search has been designed and carried out. In more 
detail, validity exists if the research technique tests 
what the researcher wanted it to test, if it was really 
suitable. It should define the relationships between 
variables and enable the investigation to hold up and 
be regarded as robust and true by other researchers. 
The validity of the technique depends on how well it 
measures the specific factors of the research area (eg 
forms of semantic differential or Likert scales for 
attitude research, or types of content analysis for the 
coverage of news items in the media). Expert opinion 
and predictive strength are also characteristics of 
validity. 

Reliability, on the other hand, is the extent to 
which the technique accurately and consistently meas- 
ures whatever it measures. It should demonstrate 
stability over time, not easy when, in qualitative 
research, control is difficult to ensure. Reliability 
will also depend on internal consistency, i.e. the 
extent to which the critical factors or ideas or 
constructs hold together under examination by 
the technique. Reliability is often defined as 
‘replicability’, so that, were the researcher to reiter- 
ate the research, or were it to be conducted at some 
other time or place, the findings, though not the 
same, would demonstrate an intellectually coherent 
comparability, even if rival findings emerged. Both 
validity and reliability have more rigid connotations 
in experimental research, since they apply to hypo- 
thetico-deductive and law-based models. It is useful 
to distinguish between positivistic and qualitative 
validity: the first asks whether the testing instrument 
measures what it 15 supposed to measure, while the 
second asks whether the researcher has gained ad- 
equate access to the knowledge and meanings of the 
respondents. Similarly for reliability where the 
positivistic approach is to ask whether the measure 
will throw up the same results on different occasions 
and the qualitative approach is to consider whether 
similar observations will be made by different re- 
searchers on different occasions. Often, of course, 
mixed methods are employed, say for data collec- 
tion, a hybrid approach called triangulation when it 
allows the researcher to use one technique to check 
or confirm another. 


Conclusion 

The range of research applications in information 
and library work today has widened and deepened 
general awareness ofthe need for systematic research 
education and training. In academic and practitioner- 
based environments, there is a professional and 
personal incentive to carry out research, to be seen 
to have mastered fundamental research skills, and 
to be carrying out research to give status and points 
ratings to library, information service, or academic 
department. Competition between in-house and con- 
sultancy-sourced research is often critical, as prior 


performance measurement, are typical. Some tech- 
niques are more appropriate to one task than another; 
for example, sampling and significance testing for 
user studies, pricing theory for budgeting, opera- 
tional research like queuing and transportation for 
collection management and distribution, and model- 
ling for international and comparative studies of 
information policy. Introductions to applications are 
becoming more popular on training courses where 
advantages can be seen. Software packages. which 
enable statistical display, analysis and forecasting 
are increasingly available to help the researcher. 
Choosing from among research techniques is 
important. Itis useful to define the problem, consider 
the approach, and identify the appropriate research 
technique. For instance, if the research is intended to 
find out how people behave in public, watching them 
is the approach and observation the technique. To 
find out what people think, asking them is the ap- 
proach, and techniques include interviews and 
questionnaires (especially if the questionnaires have 
attitude-scaled questions). To discover trends in tex- 
tual material, the approach may well be systematic 
tabulation, and the technique content analysis, ap- 
propriate for bibliometric and citation studies. For 
understanding a situation with unique and complex 
features, the approach is a detailed and holistic in- 
vestigation and the technique the case study. Some 
techniques may be identified as appropriate at face 
value, like using diaries to investigate how informa- 
tion personnel bebave in private (e.g. their daily/ 
weekly work patterns), but may present associated 
difficulties such as cooperation, confidentiality, and 
trust. 
Research in the information domain has often 
‘been characterized by surveys/questionnaires and 
interviews. Statistical analysis has tended to be used 
in areas like information analysis and user studies. 
Case studies are popular for examining information 
services in detail. However, it is useful to consider 
how much wider the range of research techniques 
actually is, and familiarize oneself with applications 
in other social sciences which nevertheless have 
realistic application in the information domain. One 
of these lies in the ethnographic area with the use of 
diaries, autobiographies or life histories, and role 
playing, where much tacit or implicit information 
can be elicited about respondents’ views about or- 
ganizational behaviour and change. Another is that 
of attitudes and personal constructs, through the use 
of repertory grids and associated techniques. More 
quantitative are statistical sampling and significance 
testing in the analysis of information services, re- 
gression in financial control, and modelling in service 
provision. From marketing and media, there are fo- 
cus groups and panel studies. From management 
‘inside the whale’ studies of the effects of organiza- 
tional culture and change. 
Research — design, methodology, testing instru- 
ments, data analysis — should have validity and 
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experience and track record are scrutinized for 
evidence of proven research. 

There is also increased emphasis on resources to 
underwrite appropriate research ventures, on the 
rationales and design options for research, the tech- 
niques available, the theoretical and paradigmatic 
frameworks within which research 1s conducted, and 
the political and diplomatic context which encour- 
ages research to take place. To that end, familiarity 
with a wide range of research guides and practical 
advice is essential for professional and personal de- 
velopment, for both intellectual and political reasons. 
Without this starting point, the research itself will 
never get done. 
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Abstract 

The electronic library is emerging as the library of the foreseeable future but its user aspect, particularly the 
usability, requires more research. This article describes the ELINOR (Electronic Library and Information Online 
Retrieval) developments at De Montfort University from the user 's perspective. It firstly shows the main features 
of the ELINOR user interface which illustrates how a user can find a document in the Electronic Library and 
subsequently read the retrieved document on the screen. This is tken followed by a discussion of the methodology 
and findings of a user study based on a random sample of eight ELINOR users conducted in the Autumn term of 
1993. The user study included searching/reading/browsing tasks and a questionnaire. The former is a controlled 
experiment designed to gauge objectively the usability of ELINOR by comparing the use of the electronic books 
with that of printed books. The questionnaire shows the user's subjective reaction to ELINOR. Future work on the 
user study will expand the sample to include all the first- and second-year students doing the Business Information 
Systems Course at the University’s Milton Keynes campus. 


The methodology of the user study is inspired by 
two previous studies. One of them was conducted by 
the researchers working on the CORE (Chemistry 
Online Retrieval Experiment) Electronic Library in 
the USA(Egan et al., 1991). They compared the use 
of printed journals with their equivalents stored in 
two electronic library systems, the SuperBook 
hypertext-based document browser and the Pixlook 
bit-mapped image-based system. Chemistry students 
were asked to perform five tasks ranging from search- 
ing to writing essays. Their findings show that both 
electronic systems had a large advantage over the 
printed system for search (including reading) and 
essay tasks. Rada & Murphy (1992) compared a 
number of hypertext-based books with printed books. 
Their findings, however, do not seem as promising 
as those from the CORE Electronic Library. Accord- 
ing to them, the computing experts were good at 
searching with electronic books but did better at 
browsing with paper. For novices, printed books 
gave the best performance for both searching and 
browsing. The somewhat different results from the 
two studies show that more work is required to 
obtain a better understanding of the user aspect of the 
electronic library. 


2. ELINOR electronic library 

The basic objective of ELINOR is to set up a large 
image database of complete books, journals, and 
course materials which can be directly accessed by 
students and teaching staff via Windows-based IBM 
and Macintosh PCs, and UNIX workstations distrib- 
uted across the various campuses of De Montfort 
University (DMU). Such an objective is being imple- 
mented in phases, firstly with a pilot to demonstrate 
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1. Introduction 
The electronic library is emerging as the library of the 
foreseeable future (Collier et al., 1993), but at present 
it still faces a number of technical and socio-economic 
challenges (Arnold et al., 1994; Wu et al., 1993; Zhao 
et al., 1993). The user aspect of the electronic library 
is clearly one of the most important areas of research 
because the electronic library, by definition, requires 
the user to search, read, and browse full-text informa- 
` tion which is solely in electronic form. Indeed, 
librarians are concerned whether their users are will- 
ing to read books on the screen. The user aspect of the 
electronic library is, of course, not just limited to the 
usability of the electronic information but covers a 
wide range of issues of human-computer interaction, 
user education programmes, and finally, for an aca- 
` demic library, the effect of the electronic library on 
teaching, learning and research activities. 

ELINOR (Electronic Library and Information 
Online Retrieval), which is one of the first electronic 


library developments in the UK, has given the user - 


aspect of the electronic library a high priority in the 
research programme. In fact, from the outset, we 
included a user-friendly interface as one of the im- 
portant selection criteria for a commercial document 
image processing (DIP) system, on which the Elec- 
tronic Library System (ELS) could be based (Ramsden 
et al., 1993). Throughout the creation of the elec- 
tronic library database we conducted a number of 
small scale user studies to acquire the user's feed- 
back, which helped to enhance database organization. 
The most recent user study was conducted in the 
autumn term of 1993 based on a random sample of 
eight students. The present article will mainly dis- 
cuss the findings from this user study. 


35,000 pages. The documents were held mainly in 
TIFF (Tagged Image File Format) Group IV bit- 
mapped image format with contents and back- 
of-the-book index pages converted to ASCII texts 
for indexing and retrieval purposes. 


2.2 User interface 
The ELINOR ELS employs current graphical user 
interface (GUI) technology, the basic feature of which 
is WIMP, i.e. window, icon, mouse and pull-down 
menu. The GUI technology provides a number of 
advantages. For example, it is easier to learn than 
menu or command-based interfaces. It provides a 
relatively common interface to a wide range of soft- 
ware. Data exchange between different software 
packages can also be readily achieved. 

The user interface of the ELINOR ELS may be 
divided into two broad aspects: finding a document 
and reading a document. 


2.2.1 Finding a document 

Browsing and searching the electronic library da- 
tabase are two possible routes to find a document, 
which may be either a known citation or any 
relevant document related to a topic. The brows- 
ing-related interface provides graphical presentation 
of the hierarchical structure of the database, i.e. 
from fileroom level down to the pages of a selected 
book in the fileroom, e.g. books > computing > 


0 "1 7 
. = = 4 
nS 
۱ ا یم‎ 
‘sos 4l - -- 
' 


The user perspective of the ELINOR electronic library 


the feasibility for one course, the BA/BSc Business 
Information Systems (BIS). This pilot has applied a 
commercial DIP system, PixTex/EFS, to convert 
printed materials into bit-mapped images. Based on 
the DIP technology, we also developed a copyright 
management system including usage monitoring, 
print control, and report generation for publishers. 
Before discussing the user study, we will first pro- 
vide a brief description of the general architecture 
and user interface of the ELINOR ELS. The user 
study conducted last autumn was based on such a 
system configuration, which has been expanded and 
enhanced since then. 


2.1 Pilot system architecture 
The pilot system used the IBM RS6000 model 520 
workstation (AIX 3.2 operating system) as the data- 
base server. The machine had 32Mb of memory and 
3Gb of hard disk. The scanning station was a 486 PC 
with 8Mb RAM managing a Fujitsu 3096E scanner. 
The user stations were four 486 PCs with Windows 
3.1 interface, 8Mb RAMs, 40Mb hard disks, and 14 
inch SuperVGA screens. Àn HP LaserJet III printer 
was attached to the scanning station. The pilot net- 
work architecture was based on Ethernet with the 
TCP/IP protocol which co-existed with Novell IPX. 
The pilot document database contained 50 text- 
books, and a limited number of lecture notes, 
examination papers and course handbooks, totalling 
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lamp is on (representing a logical 1). With any other combination of 
inputs the lamp is off (logical 0). The truth table for this circuit is shown 
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The searching and browsing routes may be com- 
bined to find a relevant document. For example, the 
user can first select certain drawers in the electronic 
library fileroom and then enter a free-text search clue 
to find relevant documents in the selected drawers. 


2.2.2 Reading a document 

The PixTex/EFS system links an OCR'ed text page of 
a document with its corresponding image page. The 
user can, therefore, switch the display between the text 
and image pages. Within the text pages, the user can 
enter simple searches on terms, which is similar t» the 
*Find' function commonly available in word proces- 
sors. For image viewing, the user can easily draz the 
page up/down and right/left. He can also enlarge or 
reduce the size ofthe image simply by double clicking 
the left-hand or right-hand buttons on the mouse It is 
also possible to zoom into a particular area o7 the 
image page. Images may be rotated to 90, 180, 270, 
360 degrees for viewing. Fig.3 illustrates some cf the 
image manipulation functions. The vertical image on 
the top left corner (10% scale in size) is first rotated 90 
degrees clockwise to the horizontal position, and then 
enlarged to 3896 scale. 

The user can flip through the pages by clicking 
the Next or Previous buttons, or the Goto button to 
select any given page. It is also possible to lock ap to 
four windows with each displaying an image paze of 
one or several books. The user can, therefore, ;om- 
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computer architecture -> From logic to computers 
-> page 3 (see Fig. 1). 

The searching-related interface includes both 
DBMS and free text retrieval. The DBMS engine 
provides searching on structured fields similar to 
the library catalogue records in OPACs. The free 
text retrieval is based on neural networks, or more 
specifically, adaptive pattern recognition process- 
ing (APRP), which caters for both Boolean and 
natural language retrieval (Dowe, 1991). The natu- 
ral language searching allows for a query as long as 
128 characters, which can be words, phrases, and 
sentences. This searching is divided into two pro- 
cedures. A quick preliminary search on the neural 
network index results in a hit list with a preliminary 
score indicating the degree of relevance. The hit 
list can subsequently be rated to provide a more 
accurate rating score. The system allows fuzzy 
searching regardless of spelling errors in the user's 
query or OCR (optical character recognition) errors 
in the text. Users can adjust the degree of relevance 
(100 levels from 17۵ to 10096) to indicate to what 
extent the query should match the source patterns 
in texts. The free text retrieval interface is illus- 
trated in Fig.2. As is illustrated by this example, the 
query contains a misspelled word, i.e. documant, 
but the system can still find 'document image 


processing' when the search exactness parameter is 
at 70%. 


Figure 2. 
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comparing the use of the electronic books with that 
of printed books. It models two primary tasks of 
using a library. Firstly, the user knows a citation (e.g. 
a book on a reading list) and wishes to obtain the 
book from the library and an answer from the book. 
For the second task, the user has a topic or question 
in mind but he is not sure where the answer is. He 
therefore needs to search and read possibly several 
books in order to find the right answer. Each task 
includes two questions. Thus, four questions should 
be answered using electronic books and another four 
similar questions should be answered using printed 
books. To complete each question, the user probably 
needs to do a combination of searching, reading and 
browsing, but the tasks are simply referred to as 
reading tasks in the discussion. 


Questionnaire 

It is used to gauge the user's subjective reaction to 
the ELINOR ELS. The questionnaire consists of five 
sections. Firstly, it asks for some background infor- 
mation about the user, mainly computing experience. 
The two major variables fcr investigation are system 
usability and usefulness which form the next two 
sections of the questionnaire. Each of the two vari- 
ables includes many sub-criterion variables. In 
addition, we are also interested to know what types 
of material are of interest to the user and finally the 
user is asked to make some overall comparisons 
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Table 11.1 Wasson's Hypotheses about Appropriate Strategies over the Product Life-Cycle 
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pare different parts of a book or read several books 
on the screen. Because the user station runs on win- 
dows-based interfaces, the user can create notes with 
any windows-based editor while viewing a page. 
Fig. 4 illustrates that the user goes to page 45 from 
the contents page of the book Introducing systems 
analysis and then turns to the next page, i.e. page 46. 
In addition it shows the user has also selected the 
other book Introducing systems design, which may 
be read whenever necessary. 

The above two procedures, i.e. finding and read- 
ing a document, can, to a certain extent, be used 
interchangeably. Namely, the user can highlight a 
term or sentence in the text page being read and go on 
to request an additional search of other documents 
matching that new pattern. 


3. User study 

The methodology of the user study conducted in the 
autumn of 1993 and the subsequent analysis of the 
data and findings will be discussed in this section. 


3.1. Data collection 
We devised the following two methods to collect 
data for the user study: 


Reading tasks 
This is a controlled experiment designed to gauge 
objectively the usability of the ELINOR ELS by 


Figure 3. 
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3.3.1 Analysis of data from reading tasks 
For the reading tasks, three methods of data analysis 
are used: 


Mean and associated statistics 

The time taken to complete individual questions are 
recorded and the means of the values are computed. 
They are illustrated in Table 1. In addition, Table 1 
shows the standard deviation (Std dev) of the values 
to indicate their variation. The minimum (Min) and 
maximum (Max) values are also included in the 
table. 

In Table 1 EB and PB represent electronic books 
and printed books respectively. As was discussed 
previously in the methodology of data collection, 
there are two reading tasks, each with two questions. 
For Task 1, i.e. finding answers from a given book, it 
is possible to record the time of finding the book and 
the time spent on finding the answer from the book 
separately. For the electronic book, the time of find- 
ing the book is simply that of searching for the book 
inthe ELINOR Electronic Library, but for the printed 
book it includes the time of searching OPAC and 
fetching the book from bookshelves (assuming the 
book is available in the library). With regard to Task 
2, it is difficult to separate the time of searching/ 
fetching from that of reading/browsing because the 
student normally has to do a combination of them, 
probably alternately until the right answer is found. 
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between, for instance, the usability of the electronic 
books and printed books, the usefulness of the ELS 
and the OPAC, and the electronic library concept and 
the conventional library. Almost all the questions are 
closed; the user simply selects a category from mul- 
tiple choices, e.g. very unsatisfactory, unsatisfactory, 
neutral, satisfactory, and very satisfactory. 


3.2. Sampling and experiment design 

We selected at random a sample of eight students 
from the 33 first-year students doing the BIS Course. 
They were asked to complete both reading tasks and 
the questionnaire. Before working on the reading 
tasks, the student subjects were given a half-hour 
introduction to the system and half an hour for hands- 
on experience of using the system. 

For the reading tasks, a within-subjects design 
was used, i.e. using the same subjects to compare the 
use of electronic books and printed books. Four 
students worked on electronic books while the other 
four were working on printed books. After that, they 
changed their positions. Immediately after complet- 
ing the tasks, the same eight subjects were asked to 
complete the questionnaire. 


3.3. Data analysis 

The user study data is analysed using the statistical 
package, SPSS/PC+ (Bryman, A. & Cramer, D., 
1990; Foster, J.J., 1993; Freund, J.E., 1988). 


Figure 4. 
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two tasks. It also calculates the difference and ratio 
of the time on the electronic book and the printed 
book. These two statistics, particularly the ratio, 
show the extent of difference in the two sample 
means. T-tests give rise to the 2-tail probability (p); 
if p « 0.05, the corresponding difference is statisti- 
cally significant. 

As is illustrated in the table, the differences (both 
searching/fetching and reading/browsing) for Task 1 
are significant, i.e. too large to be attributed to chance. 
In other words, the data seem to indicate that to find 


" 


Table 1. Mean time (seconds) and associated statistics 


Task 1 
Finding answers from a given book 
Searching/Fetching 


Reading/Browsing 
Question 1 

EB 

PB ` 
Question 2 
EB 
PB 


Task 2 

Finding answers from any relevant book 
Searching/Fetching/Reading/Browsing 
Question 3 


Question 4 
EB 
PB 


Therefore, the time shown for the second task is the 
total time spent on completing a question. 


Paired t-tests 
The paired t-tests are used to study the difference in 
the time spent on printed and electronic books. The 
objective is to determine if any difference shown from 
the sample is statistically significant, 1.6. 11 it can be 
generalized from the sample to the whole population. 
Table 2 illustrates the mean time (same as that in 
Table 1) spent on the individual questions of the 


Table 2. Difference in mean time (seconds) and paired t-tests 


Difference Ratio 2-Tail probability 
(x-y) (x/y) (p) 


0.60.030 


5.00.048 


3.70.033 


0.5 0.115 
1.30.303 
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Task 1 
Finding answers from a given book 
Searching/Fetching 
Reading/Browsing 
Question 1 
Question 2 


Task 2 
Finding answers from any relevant book 
Searching/Fetching/Reading/Browsing 
Question 3 187 
Question 4 822 
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2-Tail probability 
(p) 


warnings about the reliability of the results because 
the sample was small. Therefore, we will only use 
bar charts to show the data derived from the sample 
without attempting to generalize the results im this 
article. 

Due to length constraint of the article, we will 
only illustrate the results of some of the variables 
investigated. They are categorized into four sections 
according to the structure of the questionnaire: us- 
ability (Fig. 5), usefulness (Fig. 6 & 7), database 
contents (Fig. 8), and general comments (Fiz. 9). 
Each variable for investigation includes five options 
for the students to select. If an option is not selected 
by any ofthem, it will not be displayed on the vertical 
axis of the figures. For example, the first usebility 
variable (USABI) in Fig. 5 shows that the strdents 
only selected three choices; the other two, i.e. unsat- 
isfactory and very satisfactory are not selected. These 
bar charts are fairly self-explanatory. We only wish 
to draw the reader's attention to the general com- 
ments made by the eight students in Fig. 9. They 
seemed to feel that the usability of the elec:ronic 
book 1s somewhat inferior to the printed book. How- 
ever, they welcome the electronic library ccncept 
and regard the electronic library system as being 
more useful than the conventional library OPAC. 


4. Conclusions and future work 

The user study described above has invest gated 
some of the issues relating to user aspects >f the 
electronic library and has revealed some preliminary 
yet interesting results. As has been shown in the 
previous text, the data of the user study seem to 
indicate that it is quicker to search and find a known 
book from the Electronic Library than the ccnven- 
tional library but it is slower to find an answe- from 
the electronic book than the printed book once the 
books are retrieved. If the student is not aware which 
book contains the right answer, there seems ro sig- 
nificant difference in the time of using the Electronic 
Library or the conventional library in order to find 
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Table 3. Correlation 


Task 1 
Finding answers from a given book 
Searching/Fetching 
Reading/Browsing 
Question 1 


Question 2 


Task 2 
Finding answers from any relevant book 
Searching/Fetching/Reading/Browsing 
Question 3 
Question 4 


answers from a given citation, it is quicker to search 
the given citation from the ELS than it is to search the 
OPAC and fetch the book from the bookshelf. How- 
ever, after obtaining the book, students tend to spend 
less time finding answers from the printed book than 
the electronic book. For Task 2, i.e. finding answers 
from any relevant citations, the sample means show 
that it was quicker to use the electronic book to 
answer the first question but slower to answer the 
second one. However, neither difference is signifi- 
cantsince the probabilities of both t-tests were greater 
than 0.05. This may indicate that there is no differ- 
ence between the time spent on electronic and printed 
books for the second task. 


Correlation 

We are also interested in the correlation of the time 
spent on printed and electronic books. This attempts 
to investigate whether the two time variables are 
positively or negatively correlated, or there is no 
relation. For instance, in the case of positive correla- 
tion, a student who is quicker than others at using 
printed books is also quicker working on electronic 
books. The correlation coefficient and its correspond- 
ing 2-tail probability are illustrated in Table 3. 

The data seem to indicate that there is no strong 
correlation of the time spent on electronic books to 
that on printed books for both tasks since | r | « 0.7. 
We cannot, however, reach such a conclusion confi- 
dently because all the associated tail probabilities are 
greater than 0.05. 


3.3.2 Analysis of questionnaire 

The data from the questionnaire are categorical rather 
than continuous as illustrated previously by the ex- 
ample of different levels of satisfaction. Bar charts 
were used to illustrate the frequencies of different 
categories of a measured variable. Chi-square tests 
were used tentatively to measure if the difference 
between the different categories of a measured vari- 
able was statistically significant. The tests gave 
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increase the percentage of searching behaviour in the 
task, it is conceivable that the difference in the time 
of task completion between the two types of libraries 
will decrease. At a certain point, it could be equally 
fast to use the Electronic Library to complete the task 
as to use the conventional library (e.g. Task 2 in our 
user study). In addition, the results ofthe two reading 
tasks seem to agree with the students' subjective 
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Fig. 5 Usability 
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the answer. A possible explanation for the findings is 
that the Electronic Library has advantage over the 
conventional library for searching but the electronic 
book is not as good as the printed book for reading. If 
a task involves mainly reading behaviour (e.g. Task 
] in our user study), it could be quicker to use the 
conventional library to complete the task (provided 
the book is available in the library). Suppose now we 


Fig. 9 General comments 
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Fig. 6 Obtaining information 
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.reactions to the Electronic Library shown in the 
questionnaire. The usability of the bit-mapped im- 
age-based electronic book seemed to them somewhat 
inferior to that of the printed book. Still, they appre- 
ciated the concept of the electronic library and 
regarded the ELS more useful than OPACs. 

The aforementioned findings are admittedly pre- 
liminary results because the size of the random 
sample is small. To obtain more reliable results, we 
are at present enlarging the sample to include all the 
first- and second-year students doing the BIS Course. 
Multivariate statistical analyses will also be applied 
to consider the effects of various factors or vari- 
ables, such as the user's computing and reading 
experience and the experience of using the Elec- 
tronic Library. Furthermore, the ELINOR ELS at 
present deals only with bit-mapped image-based 
electronic books. In the future, we plan to collabo- 
rate with another electronic library project, ELSA 
(Electronic Library SGML Applications) at DMU's 
Leicester campus to compare different types of 
electronic books with printed books. The findings 
of the user studies will be used to pilot new user 
interfaces of ELS and the design of more user- 
friendly electronic books. 
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Abstract 


We have recently completed a survey of the use of hypertext systems in academic, public and special libraries 
within the United Kingdom. A questionnaire and both telephone and face-to-face interviews revealed that the 
largest application of such systems in academic libraries is the use of the World-Wide Web for networked 
document retrieval. This paper discusses the current usage of the World-Wide Web by academic library services, 
illustrating the range of facilities that libraries are starting to make available to their users. 


against the USA. Of those that were relevant, the 
majority of them described public-access POI sys- 
tems in libraries and the conclusion that such systems 
were one of the principal current applications of 
hypertext was confirmed by our subsequent data 
collection, which was carried out in two stages. 
Firstly, a postal questionnaire was distributed to 326 
academic, public and special libraries in the UK. 
Replies were obtained from 205 of these libraries, 
which represents a response rate of 6396. Ther, 21 
face-to-face and 43 telephone interviews were car- 
ried out with representatives of organizations that 
had been identified from the questionnaire responses. 
In addition to POI systems, the data collection exer- 
cise identified a further application that was of 
particular importance to academic libraries: 4 
that replied to our initial questionnaire, no less that 
55 of them mentioned use of the World-Wide Web 
(otherwise known as WWW, W3 or, as here, ‘the 
Web"), and in this paper we present an overview of 
the uses that UK academic libraries making of this 
novel type of networked information-retrieval sys- 
tem. The systems we describe are those that were 
operational as of September 1994: the current ezplo- 
sive growth in the usage of the Web means that at 
least some of the information here is likely to b2 out 
of date by the time that this paper is published. 


The World-Wide Web 
The Web is the result ofa project initiated by Berners- 
Leeetal.55at CERN (the European Centre for Particle 
Physics) in Geneva. This project defined the zrchi- 
tecture of a retrieval system for distributed 
information, i.e. one that allows information stored 
in host computers, or servers, to be accessed by 
client computers connected to the servers acrpss a 
wide-area network. In time, the term ‘World-Wide 
Web’ and its synonyms have come to be used to refer 
not simply to the retrieval system itself, but to the 
body of information stored on the Internet that is 
available to its users. 

In common with other systems designed for net- 
work retrieval, such as Gopher and WAIS (Wide-Area 
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Introduction 

The last few years have seen growing interest in the 
use of hypertext as a user-friendly way of accessing 
textual and, latterly, multimedia information". 
Hypertext systems provide the facility for any two 
objects, or nodes, to be connected by a link that 
indicates the existence of a relationship of some kind 
between the two. It is this explicit representation of 
inter-object relationships, absent in other types of 
retrieval system, that enables users to control the 
retrieval process by navigational operations, in which 
they are able to select successive nodes, and instantly 
to access the information each contains, by activat- 
ing the links between them. 

The enthusiasm for the benefits of hypertext tech- 
nology that pervades the literature is reflected by the 
range of contexts in which hypertext has found prac- 
tical applications. Thus, hypertext systems have been 
used as online documentation systems in place of 
printed reference materials, both for computer soft- 
ware and hardware and for non-computerized 
machinery; as assistants in computer-supported co- 
operative work, particularly in collaborative software 
engineering, as tools for authoring and composition; 
and as platforms for the development of interactive 
multimedia applications, such as point-of-informa- 
tion (POI) systems and computer-assisted learning 
and computer-based training packages. With fund- 
ing from the British Library Research and 
Development Department, we have recently carried 
out a six-month survey of the usage of hypertext 
within the UK library community: in particular, we 
sought to determine the extent to which the general 
enthusiasm for hypertext has been translated into 
operational library systems‘. 

The first phase of our project consisted of a 
comprehensive review of the published literature on 
the use of hypertext in libraries, using Library and 
Information Science Abstracts, the Science Citation 
Index, the Social Science Citation Index and UnCover. 
Whilst a fair number of records were retrieved in 
these searches, very few of them related to opera- 
tional, as against experimental, systems in the UK, as 


most popular of these is Mosaic, versions of which 
are produced for Microsoft Windows, X Windows 
and Macintosh environments by the National Center 
for Supercomputing Applications. Each browser uses 
a distinctive method of irterpreting the logical form 
of HTML-formatted documents and thus of present- 
ing them in physical form on the client computer's 
monitor; typically, however, objects are displayed to 
the user as ‘pages’ of text or graphics, and the sources 
of the hypertext links embedded within these objects 
are represented as highlighted buttons. Users can 
either follow the available links as they wish by 
selecting buttons with their mouses or keyboards, or 
enter the URL of any desired target that may then be 
accessed directly, Various keyword-based search 
engines are also available for the user to discover the 
exact location of particular objects. 

Striking figures recording the rapid increase in 
numbers of users of the Internet are commonplace; 
but since mid-1993, when the first version of Mosaic 
was released, the rate of expansion in the Web’s user 
base has been quite remarkable. For example, Berners- 
Lee et al.* note that there were 62 registered Web 
servers in April 1993 but that this had risen to no less 
than 1,248 by May 1994. We believe that there are 
four characteristics of the Web that have been princi- 
pally responsible for its rapid growth and widespread 
acceptance. Firstly, it is a multimedia information 
system, and thus has a facility for displaying objects 
of several non-textual types, and for preserving the 
unique ‘look-and-feel’ of objects emanating from 
different sources. Secondly, it is highly interactive, 
owing to the embedding of hypertext links within the 
body of information objects that is not provided by 
other networked information-resource detectors, such 
as Gopher. Thirdly, the Web is global in nature thus 
permitting the unrestricted dissemination and use of 
sets of information objects of all types. Applications 
developed using software such as HyperCard, 
Toolbook and Guide may be made accessible on 
campus-wide networks: information marked up in 
HTML format, on the other hand, may be made 
accessible to the world. Moreover, browsers such as 
Mosaic provide the facility for the user to view the 
HTML source code of any Web page, and then copy 
such portions of it that they wish to reuse in the 
creation of their own page: in this sense, it may be 
argued that all work on Web resources can be said to 
be truly collaborative. Finally, and most importantly, 
the Web has achieved a degree of user-friendliness in 
its interface that is, perhaps, more reminiscent of a 
computer game that of a conventional software pack- 
age. The ease of use derives from two sources: the 
lack of any requirement for users to understand the 
differences between the various protocols that are 
used in the transfer of objects of different types 
(since whatever type of browser is used to view a 
Web page, links to other HTML-formatted docu- 
ments, to FTP sites, to Telnet destinations, to Gophers 
and to WAIS indexes are all represented in the same 


Aslib Proceedings, vol.47, no.1 


The use of the World-Wide Web in UK academic libraries 


Information Server), the Web is implemented using 
server and client programs that are run by each 
computer according to its status in the system: how- 
ever, there are at least three novel aspects of the 
Web's specification that distinguish it from systems 
with related aims: these are the address system, the 
network protocol and the markup language. These 
aspects are discussed below. 

The address system allows any information ob- 
ject (such as a document, a menu of options, a data 
file or an image, inter alia) stored on any computer in 
the network to be given a reference that 1s both 
unique and independent of the network topology. 
Sucha reference takes the form ofa string made up of 
a series of parameters, and is known as a Universal 
Resource Identifier (URI) or Uniform Resource 
Locator (URL). 

In general, the client-server transfer of objects of 
different types is enabled by the specification of 
corresponding protocols: a protocol is a set of stand- 
ard definitions that allows the communicating 
computers to recognize the type of objects with 
which they are dealing and the method by which they 
should be transferred. In common with the Gopher 
and WAIS systems, the Web defines its own proto- 
col: HyperText Transfer Protocol (HTTP). This 
allows information to be retrieved with the effi- 
ciency necessary for activating, in acceptable lengths 
of time, hypertext links between objects that are 
stored on widely-dispersed computers. The client- 
server connection that is initiated in the course of any 
retrieval operation conducted in accordance with 
HTTP is held only for the duration of that single 
operation, and it is this 'statelessness' that makes a 
protocol such as HTTP more appropriate for naviga- 
tional retrieval than, for instance, FTP. Despite its 
name, the objects transferred by HTTP may be on 
any type: text, hypertext, images, etc. 

The logical markup language is used to create 
documents in which the sources and targets of 
hypertext links are defined in a standard manner. The 
HyperText Markup Language (HTML) is based on 
the Standard Generalised Markup Language (SGML), 
and similarly allows authors to use tags to define the 
structural features of documents such as titles, head- 
ings and lists, as well as to define the location of 
hypertext buttons and the URLs of their targets. 
Given the existence of support for the address system 
and network protocol described above, authors are 
thus enabled to create links between separate docu- 
ments stored in different locations rather than (as is 
often the case with microcomputer-based hypertext 
authoring systems) merely within documents. All 
Web clients are required to understand HTML in 
order to display hypertext documents, from which 
users may navigate to other objects anywhere on the 
network (not necessarily HTML -formatted them- 
selves) by activating the embedded links. - 

Several browsers, or readers, are available that 
act as interfaces to the Web's client software: the 
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classes according to five attributes: whether its li- 
brary maintains (or is provided with ) its own home 
page on the Web; whether the library's page makes 
available local library information in HTML format; 
whether the library's page provides HTTP links to 
external Web servers; whether the university's server 
provides a link to a Gopher that includes local library 
information; and whether the university server pro- 
vides a Telnet link to the local library's OPAC. 


Table 1. A classification scheme for universities 
according to their libraries’ use of the Web. 


KEY: 


v Library has its own home page 
X Library does not have its own 
home page 
Local: HTML  /7 Much local library information 
in HTML format 
V/ Some local library information 
in HTML format 


X No local library information in 
HTML format 


External: HTTP “ Some HTTP links to external 
information 
No HTTP links to external 
information 













Home page 


x 


Local: Gopher Gopher link to local Gopher 


/ 
X No Gopher link to local Gopher 
/ 


Telnet link to local OPAC 
X No Telnet link to local OPAC 
A value of °?’ indicates that the existence or other- 
wise of the links specified at the left of the row is not 
considered when assigning a university to the class 
whose code appears at the top of the column. 


Local: Telnet 


Applying the classification of Table 1, we identi- 
fied the following groups of academic libraries. 


Class Al(with a library home page containing much 
local library information in HTML format, and some 
HTTP links to external information): Greenwich, 
Keele, Loughborough, Surrey and York. York's home 
page is depicted in Figure 1. 

Class A2 (with a library home page containing some 
local information in HTML format, and some HTTP 
links to external information): Bath, Dundee, Edin- 
burgh, Exeter, Leeds, Oxford (Radcliffe Science 
Library & Bodleian Library), Queen's Belfast, Sun- 
derland, Warwick and Westminster. 
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way); and the hypertext functionality provided by 
the HTTP, which enables users to retrieve far-flung 
and disparate objects, rapidly and successively, at 
the click of a mouse. 

A wealth of information about the Web is now 
available. The papers already cited by Berners-Lee et 
al.*$ provide brief non-technical overviews; useful 
introductory material may also be found in Hughes’, 
Kleiner’, Schatz and Hardin? and White”. The Mo- 
saic browser is discussed in Andreessen and Bina!!, 
and a comprehensive review of HTML is presented 
in Barry". Furner-Hines and Willett’ detail some of 
the many sources of information that are available as 
HTML -formatted documentation on the Web itself. 


Use of the World-Wide Web in UK libraries 

The potential of the Web as a medium for the presen- 
tation of library-related information is discussed by 
Kelly, Morgan", Powell and Price-Wilkin!'é. All 
users of the World-Wide Web may be identified as 
members of at least one of two groups: active users, 
who are involved in the authorship and maintenance 
of original Web pages and links, and passive users, 
who access pages and navigate links created by 
others. As far as can be ascertained form the inter- 
views conducted as part of our survey, at the time of 
writing (mid-September 1994) active use of the Web 
by libraries in the UK is restricted to those in the 
academic sector: while 55 of the 74 academic librar- 
ies responding to our postal questionnaire mentioned 
use ofthe Web, it was mentioned by only 3 of the 94 
responding public libraries and only 9 of the 37 
responding special libraries. 

In time, as awareness of the utility of the Internet 
continues to increase outside the universities, it is to 
be expected that home pages produced by public and 
special libraries will start to appear on the Web, just 
as they already bave in the USA and Europe. For the 
moment, however, UK use of the Web outside the 
academic sector is restricted in extent, and almost 
wholly passive in nature. In one of the very few 
public libraries whose representative expressed in- 
terest in the Web during an interview, the plan is to 
designate a single PC for Internet access, initially for 
demonstrations by members of staff, and then to wait 
and see in what manner public imagination is caught. 
It is expected that business information will be the 
most popular of targets. One worry stems from the 
tendency for individual Web sessions to be lengthy 
affairs, partly as a result of the often serendipitous 
nature of hypertext navigation, and partly as a result 
of the user-friendliness of the Mosaic interface. 


A classification of active Web use in academic 
libraries 
The UK's universities may be classified according to 
the extent to which their libraries make active use of 
the Web's facilities, and a possible classification 
scheme is defined in Table 1. This table specifies 
how a university should be assigned to one of seven 
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the development of Web servers has become a matter 
of top priority. It has become common for university 
libraries, along with computing services departments, 
to be entrusted with this task, and university libraries 
are the first organizations in the UK library commu- 
nity either to create original Web material, or to 
make tentative use of the vast range of external 
resources accessible via the Web. At some universi- 
ties, extensive library-related Web material has been 
developed under the supervision of Computing Serv- 
ices departments, and/or as one component of a 
CWIS (campus-wide information server) initiative, 
rather than by library staff in isolation. Some of these 
institutions (e.g. Bristol, Edinburgh, Liverpool and 
Wales (Cardiff)) have chosen merely to convert ex- 
isting CWIS structures into HTML format, while 
others (e.g. Keele and York) are in the process of 
developing a CWIS specifically for implementation 
on the Web. Many such projects have begun quite 
recently, and it is too soon to predict with any cer- 
tainty the respective degrees of impact that library 
staff (on the one hand) and computing staff (on the 
other hand) will have on future developments. In the 
'new' universities especially, the situation is simpli- 
fied as a result of the tendency for library and 
computing services to be amalgamated into one de- 
partment, such as North London's Computing, 
Library and Media Services and Westminster's In- 
formation Resource Services. 


Selected features of library Web pages 
1. HTTP links to local library OPACSs. A central 
feature of most library Web servers is the link to the 
OPAC. Such links generally rely on the Telnet proto- 
col: although the source ofthe link is presented to the 
user as a button embedded within an HTML -formatted 
document, activation of the link initiates a Telnet 
session that will continue independently of subsequent 
Web navigation, until specifically terminated by the 
user. There are a few servers, however, that maintain 
an HTTP link to their library OPAC, and that hence 
allow searches to be conducted among the catalogue 
records without users having to negotiate separate 
protocols. Several examples of the latter may be 
found m North America and Europe, where the serv- 
ers often make use of: the public-domain WAIS 
search software by which objects may be indexed, 
and their content searched; the ISINDEX tag avail- 
able in HTML, which allows the author to designate 
a particular Web object as searchable, and to request 
the user to enter search keywords; or the FORM 
facility provided by Mosaic, which allows the author 
to design pages or ‘forms’ including fields that the 
user may ‘fill out’ according to his or her needs. 
UK libraries, however, have been slow to follow 
the lead. The way forward would appear to be for 
libraries to encourage the commercial vendors of 
library automation systems to design their own prod- 
ucts ina manner such that they may easily be accessed 
using the HTTP as well as the Telnet protocol. It is 
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Class A3 (with a library home page containing some 
local information in HTML format, but no HTTP 
links to external information): Bristol, Essex, Hull, 
Liverpool, London (Royal Free Hospital School of 
Medicine), London (Royal Postgraduate Medical 
School), North London, Oxford (Oxford University 
Libraries Automation Service), Sheffield, Stirling, 
Ulster (Coleraine), Wales (University of Wales Col- 
lege of Cardiff), Wales (University of Wales College 
of Medicine). 


Class A4 (with a library home page containing no 
local library information in HTML format): Lancas- 
ter, London (Royal Holloway and Bedford New 
College) and Strathclyde. 


Class B1(no library home page, but the serversupports 
a link to a Gopher which includes some local library 
material): Birmingham, Cambridge, City, De Montfort, 
Durham, Glasgow, London (Goldsmith’s), London 
(Queen Mary and Westfield College), London (United 
Medical and Dental Schools of Guy’s and St Thomas’ 
Hospitals), Newcastle, Nottingham, Oxford Brookes, 
South Bank, Sussex. 


Class B2 (no library home page, and the only local 
library material on the server is a Telnet link to the 
OPAC): Aberdeen, Abertay Dundee, Bradford, 
Brunel, Kent, London (King’s), Open, Portsmouth, 
Sheffield Hallam, Wolverhampton. 


Class B3 (no library home page, and no local library 
material on the server): Brighton, East Anglia, East 
London, Heriot-Watt, Kingston, London (Birkbeck), 
London (Imperial), London (University), London 
(Institute of Education), London Guildhall, Man- 
chester, UMIST, Middlesex, Plymouth, Reading, St 
Andrews, Southampton, Wales (Aberystwyth), Wales 
(Swansea). 

The following servers were not operational at the 
time the survey was undertaken: Anglia, Aston, 
Cranfield, Leicester, Salford and Ulster (Jordans- 
town). No Web server could be found at the following 
sites: Bournemouth, Buckingham, Central England, 
Central Lancashire, Coventry, Derby, Glamorgan, 
Glasgow Caledonian, Hertfordshire, Huddersfield, 
Humberside, Leeds Metropolitan, Liverpool John 
Moores, Manchester Metropolitan, Napier, North- 
umbria, Nottingham Trent, Paisley, Robert Gordon, 
Staffordshire, Teesside, Thames Valley, Wales (Ban- 
gor), Wales (Lampeter), West of England, and all 
unmentioned schools, colleges and institutes of the 
University of London. 

We must emphasize that these data represent a 
‘snapshot in time’ (i.e. the situation as it stood in 
September 1994), and are likely to become out of 
date very quickly. As a means of determining the 
level of library involvement in Web development, 
however, a classification of this type should remain 
useful for some time. 

Reflecting the Internet’s origins in the academic 
community, its exploitation at many universities is 
governed by policy formulated at a high level, and 
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‘Search’. The result is the display of a list of those 
journals whose titles contain the string, together with 
the dates ofthe runs held by the library and their shelf 
location. The University of North London presents a 
similar form, and additionally allows simple Boolzan 
combinations to be entered. The result is a lis: of 
matching items, which may be clicked-on in order to 
retrieve a full record giving details of the journal's 
frequency of issue and the location and dates of the 
library's holdings. 

` Edinburgh University's CWIS (EDINFO) is 
unique in allowing the user to carry out WAIS 
searches, not only on the university's telephone and 
emall address directories, but also on the whole 
content of the system's Web pages, as illustrated in 
Figure 4. 


4. Other special features. The University of West- 
minster takes further advantage ofthe FORM facility 
in its support of a page that invites users to enter 
comments, questions and criticisms about the server, 
and to fill in a questionnaire that asks ‘Which site 
are vou based at?', ‘What do you do?’, 'Waich 
browser(s) do you use?’ and ‘Which platform do 
you use?'. A similar comment form is provided by 
the Radcliffe Science Library & Bodleian Library 
at Oxford. 

A few other library Web pages have features that 
do not make particular use of advanced Web fzcili- 
ties such as ISINDEX or ISMAP, but that are 
uncommon enough to provoke comment. Threc ex- 
amples are: the large GIF files storing views of the 
library provided by Dundee University; the list of 
recent book acquisitions provided by Leeds Univer- 
sity Library; and the subject-based lists of Internet 
resources grouped according to their relevance to 
university courses and departments found at Essex, 
Surrey and York. 


Conclusions 
An inspection of the many Web-based services. that 
are already available serves to identify several gen- 
eral problems with current systems (though we must 
emphasize again the speed with which changes are 
taking place). 

Many servers are currently characterized b:/ one 
or more of the following: a disorganized structure, in 
which links are provided to disparate targets in an 
almost random arrangement; an apparent confusion 
as to the nature of the audience (library staff, local 
library users or external browsers/searchers) the server 
is, or should be, targeting; and a tendency to replicate 
the efforts of others in the provision of links to 
targets commonly presumed to be useful. Improve- 
ments in these three respects might be achieved by: 
creating structured, subject-based hierarchies of links; 
defining the target audience(s) and authoring pages 
accordingly; and identifying unique local informa- 
tion, including full-text collections, that may 
profitably be converted into HTML format anc pub- 
lished on the Web. 
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instructive to note that the Web OPAC at the most 
advanced stage of development in the UK, that at 
Loughborough University's Pilkington Library, is 
designed in association with the library's OPAC 
supplier, BLCMP. From the Library's home page, a 
user may select an HTTP link to a page describing 
the use of (and including a Telnet link to) Loughbor- 
ough's conventional OPAC. From there, the user 
may also activate an HTTP link labelled 'Prelimi- 
nary WWW OPAC', which results in the display of 
the page depicted in Figure 2. Four choices are 
provided ('Specific item search', 'Subject search', 
‘Borrower services’, and *Help"). In turn, selecting 
either of the first two of these results in the display of 
five choices from a selection that includes ‘Title’, 
‘Author’, ‘Number’, ‘Class’, ‘Subject’ and ‘Key- 
word’. The user is required to type in a query string, 
to decide whether to restrict his or her search to a 
particular sub-catalogue, and then to activate the 
button labelled ‘Begin search’. The output of the 
search is a ranked list of references to those docu- 
ments in the Loughborough collection whose database 
representations most closely match those of the query. 

It should be emphasized that Loughborough’s 
system is merely a prototype, and that development 
of this and similar systems at other sites is continuing. 
In the meantime, pressure could usefully be brought 
to bear on the commercial vendors of library OPACs 
to allow easy access to their products via HTTP as 
well as via Telnet, and to allow for the embedding of 
URLs in particular fields in catalogue records (so 
that, for example, a user viewing a bibliographic 
reference to a document in the OPAC may access the 
full-text of that document held in HTML format on 
the Web simply by clicking on the appropriate field). 


2. Image maps. Examples of clickable image maps 
are implemented by the Universities of Surrey and 
Westminster. These are images in GIF format marked 
up using the ISMAP tag that is supported by various 
browsers including Mosaic. Surrey provides active 
floor plans of the six levels of the George Edwards 
Library, with an average of three coloured hotspots 
per level, each linked to information relating to a 
particular area of the library (as illustrated in Figure 
3). Unfortunately, some of the hotspots are too small 
to be labelled, and so the nature of their target is not 
made clear. Westminster provides a map showing 
the sixteen locations of Information Resource Serv- 
ices, the department that administers both library and 
computing facilities at the university. At the time of 
writing, links from each numbered location were in 
place, but the target pages were empty. 


3. Searchable directories. Bath University Library 
provides the facility for the browser of its Web pages 
to search a catalogue of ‘all current periodicals and 
back-runs, including some items which do not ap- 
pear on the Library OPAC’. The user is invited to 
type a character string in a box marked for that 
purpose, and then to click on a button labelled 
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There is another potential problem area that arises 
from the globally-accessible nature of the Web: that 
of data security and integrity. Some organizations 
are more aware than others of the ease with which 
individuals may suggest, erroneously but often unin- 
tentionally, that the personal information they publish 
on the Web has received some kind of endorsement 
from their employers. Indeed, one university has 
taken the step of stating clearly on its home page that 
all those intending to place information on its server 
should first seek approval from no less an authority 
that the vice-chancellor. 

This paper has focused on academic libraries in 
the UK since it is there that use of the Web is furthest 
developed, with no evidence from our survey of 
significant use in public and special libraries. That 
said, the considerable advances made by the more 
enterprising of academic libraries should not blind 
the observer to the extent to which certain institutions, 
even in the academic sector, lag behind. The ambi- 
tions of libraries in the ‘new’ universities, in particular, 
are often limited by ever-tightening budgets, and 
financial restrictions are sometimes exacerbated by 
the logistic and political problems encountered when 
library facilities are spread over several different 
sites. For example, one respondent alleged that his 
library is starved of equipment to the extent that 
senior academic librarians, let alone undergraduate 
users of the library, are unable to gain access to the 
Internet. This situation will surely change; but it is 
likely to be a considerable time before institutions 
such as this, or their public and special counterparts, 
reach the level of development that has already been 
achieved by some academic libraries. 
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European information: 


the pattern of provision in Scotland 


Rita Marcella and Susan Parker 


The School of Librarianship and Information Studies, The Robert Gordon University, 


Hilton Place, Aberdeen AB9 1FP 


Abstract 


As part of the research underpinning an Honours dissertation a study of the agencies providing European 
information in Scotland was carried out in the spring of 1994. The aims of this study were to investigate the nature 
of the services offered by the existing agencies, to consider their accessibility and geographic spread, and to 
determine the extent of the interaction which took place between these agencies. The study sought to provide a 
broad picture of provision across a physically large geographic area. Given the relatively few agencies involved, 
visits were made to each and structured interviews were carried out with staff. 


their name was changed to Offices of the European 
Commission in 1989, in order to reinforce their 
representative rather than informational role. Prior to 
the name change, the Commission Offices had been 
deluged with enquiries (in 1990 the London office 
received between 1,000 and 1,500 calls a day?). The 
focus, it was hoped, would now move to a relay 
network of information agencies which would re- 
spond to enquiries. However, all Commission Offices 
maintain a comprehensive collection of European 
documentation and have access to all ofthe databases 
published by the European Commission. 

The office in Edinburgh is a sub-office. Its role is: 
to represent the European Commission and explain 
its policies; to disseminate information about the 
European Union; and to promote awareness of the 
impact of the European Union on the citizens of 
Scotland. The Office maintains a small reference 
library and all official European documentation. It is 
open to the general public. Contact is made with 
other agencies in Scotland via meetings of the Euro- 
pean Information Association and through the 
arrangement of special consultative meetings. 


Depository Libraries (DEPs) — none 

Depository Libraries contain comprehensive collec- 
tions of European publications. There are three in the 
United Kingdom, two held in public libraries and 
one atthe BLDSC, but none in Scotland. Their role is 
limited in terms of general access. They do however 
ensure that if the existence of a publication is known, 
then that document can be obtained via the Inter 
Library Loan network. 


The European Documentation Centres (EDCs) 
- Glasgow, Edinburgh, Dundee, Aberdeen 
European Documentation Centres were first estab- 
lished in the 1960s under the auspices of DGX. Their 
primary function was to serve the teaching and 
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It was felt that this study was particularly timely, 
given the impetus towards the better provision of 
information about Europe to the general public via 
the establishment of a national network of relays of 
information providers, The Public Information Re- 
lay, with the ultimate aim of 'bringing European 
Information closer to the people''. Therefore, the 
study included an examination of the provision made 
by several major city public library services. This 
policy is a fairly recent one, however, and strategies 
to respond to it are at present in the process of being 
developed in the public library sector in Scotland. 
` The chief focus of the study was, therefore, the well 
established forms of information provision directed 
towards the academic and research communities, via 
the European Documentation Centres (EDCs) in uni- 
versity libraries and towards the business communities 
via the European Information Centres (EICs). 
Carrefours or Rural Information Centres, which are 
intended, fairly obviously, to cater for the needs of 
rural communities and are again of fairly recent 
origin, were also examined. 

This article summarizes the findings of the study 
which was carried out, considering each of the cat- 
egories of agency in turn. It also considers the role of 
the Commission Office in Edinburgh, as the local 
embodiment of the European Commission and as a 
mechanism whereby implementation of European 
policy may be encouraged at the local level. The 
intention was to focus upon information services and 
the assistance and support provided by local authori- 
ties has not, therefore, been included as part of the 
study. 


The Office of the European Commission 

— Edinburgh 

One of four such offices in the United Kingdom, the 
Commission Office for Scotland is located in Edin- 
burgh. Previously Press and Information Offices, 
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referral to other agencies took place, the EDC being 
able to cope with any enquiries that had been made. 

Aberdeen's EDC is held in the Queen Mother 
Library of Aberdeen University. This collection is a 
selective one. One member of staff mans the service 
and enquiries are not logged specifically and so 
figures of use are not available. External enquiries 
are however received on a daily basis, although 
approaches from the general public are rare. Addi- 
tional funding to supplement the collection is 
necessary. The service is promoted by means of an 
information sheet, European Communities collec- 
tion, which describes the European Union and its 
institutions and briefly explains the nature of Euro- 
pean publications. The EDC refers users to Aberdeen 
City Libraries for standard, patent and company 
information. There are plans to develop the use ofthe 
EDC as a business resource. 

EDCs have a responsibility to report back to the 
European Commission. They have recently been 
issued with a new contract which places more em- 
phasis on communications with Brussels. 


The European Information Centres — Glasgow 
and Inverness 

While the information needs of the academic com- 
munity were being met by the EDCs, it became 
evident that the business community's needs might 
not be being adequately met. In 1987 a pilot study 
established 39 EICs across Europe. Since then 21 
EICs have been set up in the United Kingdom; two of 
these are in Scotland. EICs are based in a host 
organization, such as a Chamber of Commerce where 
links have already been established with the business 
community. There are also 5108 closely associated 
with EDCs, as in the services provided at the Univer- 
sity of Wales, College of Cardiff. 

EICs target SMEs and are supported by DGXIII 
in terms of funding, training, an E-mail network, 
publicity materials, access to databases and receipt 
of documentation. The E-mail network facilitates 
communication between EICs in all member states, 
so that information about a member state can be 
provided by the most appropriate EIC. However, the 
intention is for EICs to become self-financing and 
charges are therefore imposed for services. Clearly 
this factor has implications for usage of EICs and 
also means that they opercte competitively, despite 
co-operative measures such as the E-mail network. 

The EIC in Glasgow is supported by Scottish 
Enterprise, the Glasgow Development Agency and 
the European Commission. It has 10 members of 
staff. It has developed E-mail links with other busi- 
ness support organizations, such as Chambers of 
Commerce and Enterprise Agencies. The EIC pro- 
vides a documentation service, an enquiry service, 
access to the resources available via the E-mail net- 
work and a business cooperation service. In addition 
the EIC specializes in two areas, public procurement 
and support for research and development initiatives 
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research needs of the academic community. There 
are 44 EDCs in the United Kingdom, four in Scot- 
land. These are all located in cities. Access to a 
variety of European databases is provided free to all 
EDCs, as is documentation, but the latter may be 
either on a comprehensive or selective basis. How- 
ever, problems have arisen for many host libraries 
relating to the costs of supporting such a collection, 
in staffing, accommodation and so on. Equally, the 
recent discontinuation of free supply of EUROSTAT 
statistical information is proving to be a burden to 
libraries hosting EDCs, who now have to fund the 
supplementing of their collections. 

Edinburgh's comprehensive EDC is based in the 
Europa Library at Edinburgh University. In addition 
to their own academic community, the EDC deals 
with enquiries from other universities and colleges in 
the region and also from the business community. 
The pressure of such external demands may eventu- 
ally become a problem. The EDC has two members 
of staff and the University provides funds to extend 
the collection and to train staff. Accessions lists are 
circulated to staff and the library is keen to promote 
awareness of the EDC. There is also a user's guide to 
the collection. Enquiries are on occasion referred to 
the Commission Office. 

Glasgow’s EDC is also comprehensive and is 
based in Glasgow University Library. Its primary 
role is to serve anyone requiring European informa- 
tion in the West of Scotland. Many enquiries are 
received from the business community and in par- 
ticular from Small to medium sized enterprises 
(SMEs), who are frequently referred by the Commis- 
sion Office to the EDC. More general enquiries are 
low in comparison. It is felt that demand is likely to 
increase. The EDC has one member of staff, whose 
training is funded by the library service. Enquiries 
are recorded and lie in the region of between 20 and 
40 per week. Many of these are lengthy and detailed 
and are, therefore, very demanding of staff time. The 
Library purchases extra material to support and sup- 
plement that which they receive free. Publicity is 
aimed at the academic community within the Uni- 
versity. Free documentation is often slow to arrive 
and is poorly indexed. 

The EDC housed in the Law Library of the Uni- 
versity of Dundee is a comprehensive one. Services 
are provided both to the University and to Dundee 
Institute of Technology, to schools and to the busi- 
ness community. The EDC has one member of staff. 
All enquiries are recorded and classified as to their 
source. The majority received are from students, but 
overall use is steadily growing. Again it has been 
necessary for the University to fund the purchase of 
statistical material. Publicity 1s aimed primarily at 
the academic community. À leaflet The European 
Community and you and subject listings of materials, 
on topics such as public procurement, have been 
produced. ۸ guide to the collection was in the proc- 
ess of being compiled at the time of the study. Little 
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to the Carrefours, enables database access and an- 
swérs specific questions on demand. Carrefours are 
‘intended in part to rectify the growing imbalance 
between urban centralized development and rural 
marginalization and peripherality’®. They are 
significant in support of the information and devel- 
opment needs of rural Scotland. There can be little 
doubt as to the impact of European policy on the 
agricultural sector and any resource which can help 
those in rural areas to understand the workings of the 
Union must be a positive move. 

-Carrefours organize community information ses- 
sions on European issues of practical interest locally, 
such as rural tourism. They answer specific ques- 
tions, or direct enquiries to appropriate sources that 
will provide answers. They will trace EU publica- 
tions. 

The role of the Carrefour network (carrefour in 
this context means a place to meet, discuss, act and 
change) is: 

e to inform people in rural areas of the EC's policies 
and the help available through EC programmes 

e to stimulate discussion and encourage partner- 
ships between different groups in rural areas 

e to facilitate the exchange of information and expe- 
rience between different rural areas of the 
community 

e to feed back information to the EC on the dynam- 
ics of the region’. 

The Carrefour in Inverness 15 housed alongside 
the EIC. Information days and seminars are organ- 
ized which extend from Shetland to the Mull of 
Kintyre, which inform local, rural communities about 
the work undertaken by the Carrefour and the serv- 
ices which it can offer. The Inverness Carrefour is 
very recently established and forms part of Business 
Information Source, thus its primary aim at present 
will be to build awareness of it value and contacts in 
the local community. It has also made DGX’s Rural 
Society database available online. 

It was unfortunately impossible to visit the Dum- 
fries Carrefour in the course of the study. Rob Cockburn 
described its role in 1993 and emphasized that it was 
‘playing a proactive role in rural development and not 
relying simply on the information service generating 
development by others". Rural Gateway has, for 
example, been involved in the generation of a Busi- 
ness Initiatives Development programme designed 
to encourage teams of entrepreneurs. 

Given that Carrefours have existed for little more 
than two years their services require time to mature. 


Public library services — Glasgow, Edinburgh, 
Dundee and Aberdeen 

Within the constraints of the study, it was only 
feasible to examine provision in a small number of 
public library services. Major city libraries were 
chosen as likely to display the most extensive collec- 
tions within their well established reference 
departments. 
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in industry. Great efforts are made to promote the 
service via media releases, leaflets, industry visits and 
direct mailing runs. The Glasgow EIC reports back in 
some detail to the Commission on its activities. It 
continues to receive support from the Commission 
and if support were completely withdrawn, then the 
EIC might not be able to survive independently. A 
factor here is the acknowledged reluctance of SMEs to 
pay for information and in many instances their failure 
to recognize its potential value. 

The EIC in Inverness is called European Busi- 
ness Services and is hosted by Highland Opportunity 
Ltd, the enterprise trust for the region. It has three 
members of staff. As with the Glasgow EIC, it can be 
accessed via a network of business support agencies 
which cover the physical geographic area. This EIC 
offers a range of services including partner searches, 
commercial opportunities, materials sourcing, pub- 
lic procurement etc. It provides targeted packages of 
current information in areas such as financial assist- 
ance, food regulations and market planning. 


The European Reference Centres (ERCs) — 
Edinburgh, Stirling and Inverness 

European Reference Centres were established in 1977. 
They provide access to a basic collection of Euro- 
pean documentation, consisting primarily of core 
reference works such as catalogues and bibliographic 
sources. There are 20 ERCS in the United Kingdom; 
three are based in Scotland — at the National Library 
of Scotland, Stirling University and Inverness Public 
Library. 

Given this variety of host organization, it is more 
difficult to generalize about the precise function of 
these basic reference collections. Stirling University 
and Inverness Public Library will be accessed in 
each instance by their primary user groups. The 
National Library of Scotland aims their provision 
primarily at undergraduate reference and research. 
They frequently refer users to the EDC at Edinburgh 
University for more detailed information. Few en- 
quiries, however, are received. 

The ERC in Inverness was nominated in 1985. 
When their basic collection proves inadequate they 
frequently refer users to the Inverness EIC. How- 
ever, given the charges imposed by EICs this may be 
a barrier to access to European information for the 
general public. However, approaches from the gen- 
eral public have been infrequent. To date the library 
service has not developed or added to the basic 
collection. 


Carrefours or Rural Information Centres — 
Dumfries and Inverness 

The first Carrefours or rural information services 
were established in 1990 to bring European informa- 
tion to rural communities. There are three in the 
United Kingdom, two of which are in Scotland. The 
network is supported by a unit in Brussels in DGX 
which co-ordinates meetings, supplies information 
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The majority of European information in Glas- 
gow City Libraries is held on the Social Sciences 
floor of the Mitchell Library. As a reference library 
most of the Mitchell’s collection is on closed access. 
Again demand for European information has not 
been high and requests for more detailed information 
are referred to the EDC at Glasgow University. ۸ 
select bibliography was compiled on '/992: The 
Single European Market', which brought together 
sources that might be of interest to the general pub- 
lic. Again there were no publicity materials or guiding 
specific to European information. 

The Central Library in the Wellgate shopping 
centre houses the majority of Dundee District Librar- 
ies’ European information. Again demand had been 
low and it was fairly common for enquiries to be 
referred to the EDC in Dundee. There was no public- 
ity specific to European information or guiding to 
highlight the existence of such information. The 
library service was, at the time of the study, very 
positive about the possibility of developing their 
provision of European information and felt that it 
was very much part of the library’s role to involve the 
public in European issues. 

European information is housed in the Business 
and Technical Department of Aberdeen City Librar- 
ies, perhaps reflecting a different emphasis from the 
other public library services so far discussed. Demand 
here is felt to emanate primarily from the business 
community, as well as from educational institutions. 
Enquiries from the general public were, however, 
infrequent. Enquiries averaged between three and 
four a day. In order to meet an expectation of higher 
demand EC Info Disk had been purchased on CD- 
ROM. It was common for referrals to be made to the 
EDC in Aberdeen. The library had guiding to draw 
attention to their European information section and 
had produced a promotional leaflet. Periodically dis- 
plays were also used to highlight the availability of 
such a resource. The library were keenly aware of the 
significance of keeping the community informed of 
activities and developments within the European 
Union. 

At the time of the study only Aberdeen held 
European information on CD-ROM, 


Conclusions 
There was little evidence of referral and use of other 
agencies from the interviews that took place and 
where referral was made it was largely to other 
agencies in close geographic proximity. What inter- 
action there was seemed to be very much on an 
individual and ad hoc basis. There was often a lack of 
awareness of what precisely the other agencies had 
to offer. The most informed, frequent and effective 
referral took place where there was already a good 
deal of interaction between services. 

The study showed that growing demands were 
being placed on certain agencies, although these 
varied in their nature. The EDCs face a broadening of 
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Although at the time of the study the develop- 
ment of policy on access to European information by 
the general public had not been far developed, there 
were indications that the role of the public library 
sector was set to grow. A Gallup survey of informa- 
tion needs in Britain had reported that 70% of all 
respondents cited their local library as a source which 
should be doing more in terms of the provision of 
European information. Consultation is ongoing 
between Commission representatives and representa- 
tives of the library services, such as FOLACL 
(Federation of Local Authority Chief Librarians), 
SLIC (Scottish Library & Information Council) and 
COSLA, which is seeking to build a network of 
public library services who are willing to participate. 
Such participating libraries will receive: free copies 
of basic texts on the European Union; a 5096 dis- 
count on priced publications from EUR OP; a 5096 
discount on access to certain EU databases; supply of 
handout material produced by the European Com- 
mission; a list of essential publications; opportunities 
to meet with other members of the network; guid- 
ance from European Commission offices; a 
newsletter; a directory of members of the relay; and 
training on the use of European Commission docu- 
mentation. However, no direct funding will be 
available. 

It is only fair to emphasize that, at the point at 
which this present study was carried out, the consul- 
tation process had not yetbegun in Scotland. Although 
staff were asked at interviews about their awareness 
of the programme, few had been directly involved. 
The picture described here, therefore, reflects the 
services provided before the EU initiative effectively 
got underway. In the aftermath of the National Con- 
sultative conference held in England in January 1993 
and a FOLACL seminar held in December 1993, 
Michael Dolan reported that *... it is now clear that 
there is overwhelming support from the UK library 
community for the principle of improved access to 
European information". 

While only eight Scottish local authorities had 
joined the Public Information Relay by October 19948, 
it is hoped that the response will eventually be over- 
whelming in Scotland. More than 80 UK public 
libraries have so far joined the relay. 

Edinburgh City Libraries houses within its refer- 
ence department a collection of European material. 
Demand is in the region of three enquiries a day, a 
very small proportion of total enquiries. Staff fre- 
quently refer users to both the EDC and the 
Commission Office. A bibliography and subject guide 
to European information has been compiled as an aid 
to staff. No publicity material or guidance specific to 
European information was provided for users. The 
speed at which European information dates was felt 
to be a significant problem for any library service 
building a collection. The library service felt that 
demand was likely to grow, particularly for standard 
and legislative information. 
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evident willingness of the public library sector to 
co-operate towards the achievement of such a net- 
work of relays, the picture which is presented in this 
paper will change very rapidly. National consulta- 
tive conferences have been held in England, at Stoke 
Rochford in 1993 and 1994. Consultation has also 
been encouraged in Scotland through a meeting of 
public sector Chief Librarians organized by the Office 
of the European Commission, SLIC and COSLA. 
The players are certainly talking to each other and we 
await developments with great interest. 

The situation is, perhaps, much less dynamic for 
the longer established agencies, such as the Euro- 
pean Documentation Centres, the European 
Information Centres and the Rural Information Cen- 
tres. The picture presented there is less liable to 
dramatic change. However, with new contracts be- 
ing negotiated with the EDCs in response to the 
move towards greater transparency and a broadening 
of access to information that had hitherto been avail- 
able only to small elites, change may take place in all 
of the agencies described in this paper’. The Euro- 
pean Information Association also continues to 
support and encourage greater interaction and co- 
operation amongst all of its members — approximately 
360 organizations according to the directory of mem- 
bers, which include EDCs, EICs, public library 
services, local authority departments, national li- 
braries, government agencies, Chambers of 
Commerce, solicitors, financial institutions and other 
companies. This wide membership reflects the range 
of interest in European information provision. It is 
important that EDCs, Carrefours and EICs — as well 
as this representative range of interested parties — see 
themselves as part of the Public Information Relay 
and contribute to its development. As Ian Thomson, 
then Chairman of the European Information Asso- 
ciation, said in 1992, ‘It will be important to convince 
existing networks such as EICs and EDCs as to the 
value of joining this informal umbrella network’, 

Giancarlo Pau, in a speech at the Public Libraries 
Conference in York on the 28th of September 1994, 
described the activities of the newly established In- 
formation Network Unit. The emphasis, he told the 
delegates, is now on a decentralized information 
policy which will: 

"bring together various existing, but as yet 

uncoordinated information outlets, so as to 

enable them to come together as part of a 

single national network for the provision of 

EU information in all aspects, separate from, 

but in partnership with the European Com- 

mission ۰ 

The UK initiative is the first of its kind in Europe. 
The forthcoming months will be of great interest to 
the information profession as the response consoli- 
dates. Research in this dynamic area continues at the 
School of Librarianship and Information Studies, the 
Robert Gordon University, and we would most 
warmly welcome input from interested parties. 
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their client base — internal and external students, 
academic staff and researchers, the general public, 
the business community and schools all make ap- 
proaches. They also face a diminution in their resource 
base resulting in library services having to supple- 
ment their collections. This is in addition to the other 
costs already associated with the housing of an EDC 
— administrative, staff and accommodation — which 
are not inconsiderable. On the other hand the EICs 
face the challenge of becoming completely self fund- 
ing. 

Communications between the various agencies 
and the European Commission were not uniform. 
The EICSs and the Carrefours have established mecha- 
nisms by which they report back to the Commission. 
For the ERCs and the EDCs, however, there seems to 
be little interaction. The ERCS in particular seemed 
conscious of this lack. 


Geographic distribution of 
European information agencies 
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There is distinct variation in the pattern of provi- 
sion geographically across Scotland (see map of 
distribution of service points). Given the evidence of 
approach to local agencies this may create imbal- 
ances in terms of access. The implications of some of 
the findings detailed above for the development of 
the Public Information Relay mav be significant. 
The development of services in Edinburgh, which 
has a number of existing resources, may have to be 
very different from that in Highland. Equally there 
are problems in developing a free service to the 
general public when these are set alongside existing 
charged services. 

It is likely — and to be hoped — that, given the 
development ofthe Public Information Relay and the 
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Abstract 

This paper examines the implications on translation services of the exploding use of new telecommunications 
technologies, with particular emphasis on computer communications in cyberspace, represented by the Internet. 
The relationship between language translation and cyberspace in an advanced information society is explored to 
show that the new communications environment is both creating demand for translation and helping to meet that 
demand. The paper concludes that the symbiosis between translation services and the emerging communications 
environment could culminate in ‘a teletranslation service’ as an International Value-Added Network Service 


(IVANS) with worldwide accessibility using networked translators best to meet growing demand. 


and, increasingly, private individuals. Furthermore, 
the merging of computers with telecommunications 
has led to an increasingly porous and interlinked 
world. 

On the surface, these trends suggest a breaking 
down of national barriers and an ongoing reduction 
in and perhaps eventually the elimination of 
impediments to global communication. The reality, 
however, is not so straightforward. The new infor- 
mation superhighways are opening the way to an 
enormous volume of cross-cultural communication 
and as a result they are inviting a variety of poten- 
tial language problems in the process. While an 
American businessman can carry in his briefcase a 
notebook computer and a cellular phone, and con- 
vert a hotel room into a virtual office, he is totally 
helpless when modern telecommunications brings 
his Saudi client on the other end ofthe line, speaking 
Arabic, or when his urgent fax in English sent to 
China fails to produce results. A New Zealand 
researcher may use the Internet e-mail service to 
request information from a Japanese researcher, 
only to find that his or her counterpart does not 
understand English. It seems that in the past the 
difficulty of gaining communications access al- 
most overrode the issue of language problems. 
Although the actual language barrier is no greater 
now than before, the perceived barrier is, because 
of the ease of communication access afforded by 
advancement of communications technology as 
illustrated by Figure 1. Ironically, now that the 
technical problems are being solved the real prob- 
lem of communication is emerging, and it is 
immense. The users of sophisticated communica- 
tions links will be increasingly exposed to foreign 
languages and cultures. 

The following discussion is based on a research 
project I have recently completed for the New Zea- 
land telecommunications industry and has a focal 
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Introduction 
If we describe the 1980s as the age of computing, the 
1990s must be the age of global networking. It has 
been made possible as a result of the merging of 
computers and telecommunications technologies 
which already allow us to communicate using voice, 
text or image in real-time or non real-time irrespec- 
tive of location. Today computers not only form an 
integral part of all modern telecommunications sys- 
tems, providing the 'intelligence' required to connect 
people on demand, they are increasingly 'customers' 
of telecommunications too, using the networks to 
link up other computers in diverse geographical lo- 
cations. The interlinked computers enable information 
of any type to be carried almost instantaneously in 
the form of digital signals over great distances and 
create the world of 'cyberspace', a term initially 
coined by William Gibson in his novel Neuromancer. 
The importance of this communications space to the 
translation industry has been reflected in the grow- 
ing number of recent papers presented in the 
‘Translating and the Computer’ series of confer- 
ences with a focus on telecommunications aspects. 
The applications of cyberspace are many and 
varied. They may include the communication space 
created by airline, hotel and car rental reservation 
systems which enable international travel arrange- 
ments to be made from a travel agent's terminal 
anywhere in the world. In cyberspace there are vir- 
tual libraries of millions of volumes and many 
thousands of papers which are already benefiting 
people the world over. While tele-learning is 
revolutionalizing traditional learning systems, the 
growing use of personal computers at home is popu- 
larizing electronic services such as bulletin board 
systems (BBS), home banking, electronic mail (e- 
mail) and teleshopping. In short, cyberspace is 
providing the channel for a huge range of informa- 
tion exchange required by industry, governments 
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Figure 1. The growing significance of language as a barrier to communications 
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Tele-conferences (audio and video conferences) 
Audio and video tele-conferencing are now supple- 
menting and in some cases taking the place ofphysical 
meetings, as falling telecommunications charges lead 
to cost advantages compared with travel and accom- 
modation. Whenevertele-ccnferences cross language 
boundaries the services of interpreters and often 
translators as well will be required, and these could 
be provided in a variety of different ways. Confer- 
ences can create a demand for enormous amounts of 
text translation as well as interpreting services since 
participants typically need to exchange copies of 
their contributions, presentations and reports in writ- 
ten form. Today, fax machines and sometimes 
electronic whiteboards with fax transmission capa- 
bilities supplement tele-conference audio and video 
links to carry the associated papers, and wherever 
multiple languages are involved, translators are re- 
quired. 

Future conferences are expected to be multi- 
media affairs with each participantusing a networked 
desktop computer terminal with text, voice and im- 
age channels all combined. Presenting a ‘paper’ will 
involve displaying text, diagrams and pictures on 
participants’ terminals while simultaneously giving 
an oral explanation, and participants will be able to 
respond via the same media. In addition to simulta- 
neous interpreting of the discussion, there will be an 
expectation that text will be delivered immediately 
to each participant’s terminal in the language of their 
choice. In the past, conference participants accepted 
that requirements for translation necessitated their 
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point of how telecommunications can be used to 
resolve language barriers. In the wake of the sudden 
prominence of the information superhighway with 
the exponential growth of the Internet, translators as 
information and communications workers clearly 
cannot afford to ignore this emerging new communi- 
cations environment. 


The linkage between language services and 
cyberspace 

A significant volume of new demand for language 
services is likely to arise directly from the enhanced 
capabilities for connectivity offered by modern tel- 
ecommunications. By their particular nature, certain 
telecommunications-based services are inherently 
likely to create their own demand for language 
services. In other words, while new communica- 
tions services aim to make people and information 
more accessible they also make language barriers 
between the communicating parties more promi- 
nent. Communications services of this type are 
summarized in Table 1, where they are classified 
according to the type of language service demand 
that may arise. 

Although the table makes a distinction between 
voice-based and text-based services, increasing me- 
dia convergence is tending to blur this demarcation 
from a telecommunications service viewpoint. Nev- 
ertheless, interpreting and translation skills are 
somewhat different, so the distinction is still mean- 
ingful for the language service provider and for the 
purpose of this paper the latter is focused on. 
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the expansion of these forms of communications 
across language barriers. A significant proportion of 
computer-to-computer communication of this type 
does not demand real-time translation; some delay in 
processing text can be accepted. But some activities, 
such as the real-time chat mode, can involve immedi- 
ate interaction — written conversations in effect — and 
forthese to take place across a language barrier it will 
be necessary to have a translator — human or machine 
— in the pipeline between the communicating parties 
operating in real-time. 

Translators will find an e-mail address as essen- 
tial to doing business in the next decade as a fax 
number is today. The issue of incompatible encoding 
schemes for electronic transmission of non-Roman 
scripts will need to be addressed at an engineering 
level in order to facilitate the smooth flow of informa- 
tion through the text production process. This issue is 
touched on later with a closer examination of e-mail. 


Specialized terminals 
There 1s a growing trend towards the use of terminals 
in public places to provide specialized information 
and allow transactions to be made on a casual basis. 
For example, 50 ‘information kiosk’ like interactive 
terminals are being installed in malls, grocery stores, 
etc. in Texas, USA, to provide such information as 
details of government jobs, unemployment assist- 
ance, etc. in both English and Spanish’. Other examples 
include: tourist guides which display accommoda- 
tion, restaurant and transport information; airline and 
rail timetable displays with the capability of making 
reservations and issuing tickets; automatic bank tell- 
ers which allow customers to query bank accounts 
and to transfer, deposit and withdraw money. 
Suppliers of these systems need to give consid- 
eration to the language needs of their anticipated 
users. These systems are potential sources of lan- 
guage demand, as they all require the user to be able 
to interact with the terminal using the language dis- 
played and they are all especially likely to be used by 
tourists. There is some evidence that language needs 
are already being addressed, as my husband discov- 
ered recently when attempting to extract money from 


supplying advance copies of papers, and that trans- 
lated copies of final reports might not appear until 
weeks after the end of the meeting. This is no longer 
acceptable in the faster time-frames of today’s and 
tomorrow’s tele-conferences. Translation services 
will have to find ways of combining human and 
machine resources to reduce turn-around times for 
their products to a minimum, and to link their com- 
munications channels directly into those used by 
conference participants. 


E-mail, bulletin boards & database access 
The practice of gathering information from databases 
accessed via dial-up computer networks is growing 
steadily, especially in the academic and business 
sectors. A huge range and depth of publicly accessi- 
ble information are available, and by subscribing to 
electronic networks which have international links 
via Internet, for example, information can be easily 
obtained from all over the world. But databases in 
foreign countries will inevitably mean the provision 
of information in languages unfamiliar to the recipi- 
ent, and today, getting such information translated 
often means delivering a printed copy to a translation 
company. Some information providers may antici- 
pate demand and offer their data already translated 
into a range of languages by building in translation as 
part of their services. In other cases, however, it will 
be up to end-users to arrange their own translations. 
Similar translation requirements will arise from a 
range of other branches of computer communica- 
tions. E-mail and bulletin boards now enable students 
working on research projects to interact with their 
counterparts throughout the world, provided they all 
understand the same language. Or, if you have no time 
to go out shopping, a growing range of goods can be 
ordered from your computer terminal for delivery to 
your home or office. For example, the CompuServe 
network offers access to some 120 different teleshops, 
enabling orders to be placed via computer for items 
ranging from airline seats through books, coffee and 
contact lenses to real estate, software and stocks and 
shares — for users who can understand English. The 
provision of translation services is a prerequisite for 
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cations. But telecommunications technology can stil] 
help to meet demand from cther sources in providing 
means to streamline the translation process by link- 
ing the information source, translator and information 
recipient electronically. The key requirements for 
translation services of short turn-around, high qual- 
ity and low cost are becoming increasingly difficult 
and sometimes impossible to satisfy without tel- 
ecommunications links. For example, use of 
immediate and efficient electronic links to connect 
appropriate human and machine (i.e. MT) resources 
in diverse locations may make it possible to com- 
plete a seemingly difficult high volume job in a rare 
language pair on time. While today's translation 
services already make good use of fax and modem to 
connect with customers, and with remote working 
freelance translators, these links often require ad-hoc 
arrangements which eventually run up high commu- 
nications costs and are likely to cause difficulties in 
coordinating each resource in a coherent manner to 
deliver quality products. 

The past decade has seen several translation serv- 
ices come into being which have been deliberately 
designed for maximum integration with telecommu- 
nications systems. AT&T Language Line, KDD 
Teleserve and Translatel are subsidiaries of AT&T, 
KDD and France Telecom respectively, designed to 
exploit the full potential of telecommunications in 
meeting language needs which may also have origi- 
nated in the use of telecommunications. They offer 
real-time interpreting services via telephone as well 
as translation services, using facilities such as 
e-mail. These are the first generation of what I call 
*teletranslation' services. Falling prices of telecom- 
munications mean that language barriers today could 
form a much greater discouragement to international 
communications than does the cost of the communi- 
cations channel. And computer network service 
providers are also making translation services avail- 
able to their subscribers who are increasingly 
venturing into unfamiliar linguistic territory in search 
of information or pure entertainment. 

Translation services which are starting to appear 
on the menu of many publicly accessible computer 
networks, including the three major networks in 
Japan, PC-VAN, NIFTY -Serve and ASCII Net. These 
networks offer twelve such translation services, half 
based on human translation (two of which are exclu- 
sively for post-editing of translation produced by 
MT) with the remaining six services based on MT. E- 
mail is used as a primary medium for all these 
services for text delivery, complemented by fax. In 
Europe similar services are reported to be offered by 
Systran’. A particular interest in terms of tele-trans- 
lation is one recently formed company, WORDNET’. 
It offers a worldwide translation service via the US 
based DELPHI Internet Service and also directly via 
the Internet using a combination of fax, e-mail and 
data communication. WORDNET employs over 1000 
remote working translators and is cited as a success- 
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an automatic teller machine in Hong Kong using his 
New Zealand credit card. Faced with a screen full of 
instructions in Chinese, he hoped he would be able to 
guess his way through the process successfully. To 
- his relief, when he inserted his card, all instructions 
changed to English! Obviously the system is pro- 
grammed to recognize the card's origin and to respond 
in a suitable language. Unfortunately the money 
machines in New Zealand are not yet offering the 
equivalent service for Chinese visitors. 

Because these services follow a predetermined 
format, text in a given language can generally be 
stored at the terminals themselves, as in the case of 
the Texas information kiosk terminals. When serv- 
ices provide interactive responses between users and 
remote databases they need to be connected to the 
supplier of an appropriate language skill, human or 
machine, as required by users, to respond to ad hoc 
situations. There is no doubt that more user-friendly 
voice-activated systems will become available in the 
coming decade. 


Other telecommunications-based services 
The marriage between computers and telecommuni- 
cations is having an impact in a number of other 
areas with possible language implications. Educa- 
tion, for example, is a major growth market. 
Tele-learning allows peer students to interact at a 
distance and enables individuals to receive instruc- 
tion from experts in a given field irrespective of 
location. To be truly international, education pro- 
vided in this manner will require language services 
in the communications channel between teachers 
and students. Entertainment is another potential mar- 
ket for language services, particularly as interactive 
TV and on-demand video become readily available. 
We may soon be thinking in terms of hundreds of TV 
channels to choose from instead of a dozen, and this 
will certainly include foreign language content. Call- 
ing up a foreign film which can be dubbed or subtitled 
on demand in your language may be possible. 
Thanks to telecommunications, the publishing 
industry now has the possibility of electronic distri- 
bution of text and graphics. Already we can read the 
latest Stephen King novel on Internet ahead of the 
printed copy, and books can be downloaded to our 
PCs to read on-screen or print out, as we prefer. Such 
on-demand publishing will require translation if it is 
to access foreign language markets, and teletranslation 
will provide the solution. Instead of a lengthy delay 
while you wait for your favourite author's new titles 
to be translated a few years after the publication in 
the original language, the translated version could be 
delivered almost simultaneously with its first re- 
lease. 


How telecommunications can help meet the 
demand for translation services 

Of course not all new demand for language transla- 
tion will arise from increased use of telecommuni- 
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although advertising in newsgroups is virtually pro- 
hibited. In relation to the translation industry, the 
Internet also offers a wide variety of services. For 
background research purposes, its rich sources of 
information via various databases will be useful 
while online access to multilingual terminology 
databanks such as EURODICATUM would be a 
distinctive advantage. Translators can also order 
books online via teleshopping, while quick scanning 
through the world news may keep information on 
relevant current issues. They are offered by different 
sources in the world. I can access Japanese 
newsgroups and view their messages in Japanese. 
The Internet's software utilities section’ indicates 
relevant ‘sites’ from where various software tools 
can be downloaded, for example, to enable text 
written in such languages as Chinese, Korean and 
Japanese to be read. It seems only a matter of time 
before public-domain PC-based MT software starts 
to appear on the Internet to help users ‘surf around 
language barriers and access information provided 
in different languages. Probably the Internet's most 
utilized and perhaps its most relevant service in 
relation to the translation business 1s e-mail. 


E-mail: Its advantages and remaining problems 

À significant simplification in the procedures in- 
volved in transferring information between two 
remotely located computers can be achieved if the 
two remote computers that need to communicate 
subscribe to an e-mail service. E-mail essentially 
uses one or more intermediate computers to store 
the electronic message, or file, until it is convenient 
for the intended recipient's computer to receive it. 
E-mail effectively provides a subscriber with an 
addressable electronic mail box in a host computer 
network. Other computers can dial up the network 
and deposit files into the subscriber's mailbox (us- 
ing appropriate communications software and a 
modem connected to the telephone network). When 
convenient, the mailbox subscriber can similarly 
access his mailbox and download its contents to his 
own computer. By belonging to an e-mail service, 
the need to be concerned with matching communi- 
cations-related parameters required for modem- 
to-modem communications is removed, as these 
settings are in general fixed for the subscriber's 
relationship with the e-mail service. Another con- 
venience offered by e-mail is that the use of an 
intermediate network enables asynchronous com- 
munications between the two end point computers. 
This means they no longer both need to be linked 
into the telephone network at the same time in order 
to exchange text — often difficult to arrange particu- 
larly if the communicating parties are in different 
time zones. The economics of using e-mail versus 
paying toll bills for direct phone links between 
computers will of course depend on subscription 
and usage costs of the e-mail service used and the 
telephone charge rates for the distance and time 
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ful example of a business which is based on elec- 
tronic communications technology, linking the 
customer and the company on the one hand and the 
company's project managers and its translators on 
the other. CompuServe, one of the largest online 
service providers in the world, is also introducing an 
MT-based translation service during this year to 
cater for the needs of its European customers specifi- 
cally. 


Emerging electronic commerce and the role of 
teletranslation 

The symbiosis between the new communications 
environment and the translation service seems evi- 
dent and as a result various teletranslation services 
are developing. Symbolizing the new communica- 
tions environment is the exponential growth of the 
Internet, and the emergence of electronic commerce 
as an Internet application has particular relevance to 
the future development ofthe translation service. For 
example, a trial version of software can be 
downloaded from an Internet 'site' which can be 
accessed by any user located anywhere in the world. 
I found an online help service offered via e-mail by 
an overseas software company cheaper than sending 
fax messages and much more helpful than ‘user 
unfriendly manuals. Marketing products and dis- 
tributing services in this way are also effective in 
terms of reaching a large number of potential cus- 
tomers simultaneously via a single window. 
WORDNET is another example of taking advantage 
of this environment. 


Internet applications 

The Internet, developed out of the first extensive 
US computer network ARPANET created in 1968, 
is now accessible in about 137 countries to users 
whose numbers have grown from a few thousand to 
an estimated 25 to 40 million during the last decade. 
Analysts predict that by 1997 some 120 million 
users will be on the Internet, which will then be 
providing an even greater range of services. This 
world's biggest network of networks carries traffic 
ranging from personal mail (e-mail), hobbyist bul- 
letin boards (newsgroups) of over 2000 and 
entertainment services through R&D activities by 
academic and business sectors to commercial infor- 
mation for marketing and sales. The Internet's 
growth rate of a million new users every month 
heralds the dawn of the network-based society. And 
as such it holds far-reaching implications to a wider 
community as well as the translation business in 
particular. 

Although the Internet started as a non-commer- 
cial research-oriented network and still retains these 
characteristics, the situation is changing rapidly. 
Today more than half of the Internet traffic is com- 
mercial and much of its increasing popularity is of a 
commercial nature’. It is possible to advertise prod- 
ucts and also provide actual services via the Internet 
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Conclusion: bringing it all together 

A future translation service must be considered as 
one component of a highly information and network- 
oriented society in which all kinds of information in 
electronic form are traded internationally. Such an 
information society is likely to undergo continuous 
and rapid development just as demonstrated by the 
rapid proliferation of the use of the Internet during 
the last few years. As has been argued the language 
service has a deep affinity with such an environment 
and indeed has an important role to play in this 
increasingly interlinked world. The application of e- 
mail alone suggests how a conventional translation 
service could expand its market and resources suit- 
able to provide worldwide services. The concept of 
teletranslation is based on the application of struc- 
tured and efficient global networks to bring language 
service providers and their customers together in 
cyberspace while behind the scenes human experts 
and MT are linked by well designed communications 
networks. ۸ teletranslation service can effectively 
be seen as a form of International Value-Added 
Network Service (IV ANS) as a result of convergence 
of information services with the computer and tel- 
ecommunications. Any information service of today 
will be a potential candidate to become an IV ANS as 
electronic commerce becomes widespread. Realiza- 
tion of language service as a form of IVANS calls for 
a stronger union between today's translation service 
and the telecommunications industries to bring about 
its progeny — a teletranslation service. 
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involved, but generally, for long distance commu- 
nications there are cost advantages in using e-mail. 

Whether a subscriber to a particular e-mail serv- 
ice can access a given e-mail address in another 
network will depend on the interconnection arrange- 
ments in place between the networks involved. At 
the time of writing, for example, the Japanese net- 
works NIFTY-Serve and PC-VAN have both recently 
opened up links to the Internet for their e-mail serv- 
ices. The JUNET (Japan UNIX Network) also has 
international links. Other commercial e-mail serv- 
ices connected via Internet gateways include X.400 
mail servers, CompuServe, America Online, MCI 
Mail among the best known. This may sound rather 
primitive in comparison with making an interna- 
tional phone call, but my personal experience 
indicates that an actual trial is the best way to deter- 
mine whether communication can be achieved to a 
particular e-mail address. m" 

Considering these advantages, e-mail provides 
an ideal medium for translation service providers to 
send and receive the text, and for posting questions 
to clients and fellow translators. E-mail can also be 
used to identify the location of a type of file you may 
wish to download from various Internet databases. 
The reply message will contain a listing of all the 
sites which contain the relevant files and the directo- 
ries in which the files reside. 

There are, however, remaining problems of differ- 
entcharacter sets and encoding schemes to be overcome 
before successful communication by e-mail in Japa- 
_nese and other Asian language text can be achieved. 
While sending messages in the form of ASCII text will 
be mostly straightforward, handling of scripts that use 
non-ASCII encoding schemes can get very compli- 
cated. It is possible to send Japanese text via some 
e-mail networks outside Japan if it has been encoded 
according to the JIS standard and provided a number 
of other rules are followed. In some situations, how- 
ever, Japanese text will be distorted en route (which, 
for inter-network communication in the English lan- 
guage environment, is essentially designed to carry 
one-byte, seven-bit ASCII codes) and it will be neces- 
sary to use special software tools to modify the text to 
enable it to be carried transparently’. 

In summary, although truly global and transpar- 
ent communications awaits resolution of a number of 
standards and compatibility related issues, the exten- 
sion oftelecommunications to the exchange of written 
words directly between computers in the form of e- 
mail is bringing considerable convenience to the 
translation industry. 
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Abstract 


This paper assesses the degree to which established practices in terminology can provide the translation industry 
with the lexical means to support mediation of information between languages, especially where such mediation 
involves modification. The effects of term variation, collocation and sublanguage phraseology present problems 
of term choice to the translator. Current term resources cannot help much with these problems; however, tools and 
techniques are discussed which, in the near future, will offer translators the means to make appropriate choices 


of being developed to aid translators, perhaps indi- 
rectly, in coming to grips with variation, collocation 
and sublanguage phraseology. Here we shall find 
that there is much cause for optimism. 


Translation | 

We start by making a gross distinction between two 
kinds of translation: that in which the target text is a 
dependent text with respect to the source text and 
that in which the target text is a derived text with 
respect to the source text. 

Translation is often discussed from the point of 
view of determining the content, form and purpose of 
some source language message and then preserving 
this information iri translation. In other words, one is 
focusing on determining the meaning and form of 
some source language message, followed by finding 


-a linkto an equivalent meaning and form in the target 


language. It is seen as essential to preserve meaning 


‘and form in translation. This is the traditional view of 


translation and much translation is carried out ac- 
cording to this view, where the resultant target text is 
linguistically and communicatively equivalent to the 
source language text. We may say that a dependent 
message is produced under this view: the target 
depends entirely on the source for its content, pur- 
pose, form, etc. Many of the lexical resources 
available to translators are geared to support this type 
of translation. 

However, when we look more closely at today’s 
translation industry, we see also the increasing need 
to take into account modification of the text in trans- 
lation, where one may choose — or more commonly 
be required, via the translation specification — to vary 
some or all of content, intention and text type, to- 
gether with other textual and language-dependent 
parameters. Under this view, translation is seen as 
mediation, where mediation may, and commonly 
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Introduction 

Our objectives in this contribution are: 

e to discuss the current relationship between the 
fields of terminology and translation; 

e toassess the degree to which established vds 

. in terminology can be said to provide the transla- 
tion industry with the lexical means required to 
support appropriate mediation of information 
between languages, where mediation nowadays 
often implies modification; 

e to demonstrate how techniques and tools being 
explored by practitioners in natural language 
processing could be of help in providing such 
lexical means, especially in situations where no 
relevant lexical resources exist. 


In what follows, we take a somewhat idealized 


view of translation — idealized in that although we 
realize there are many different types of translation 
and of translators, unfortunately, in this short paper, 
we cannot hope to take into account every type. We 
will thus discuss general aspects, trusting that the 
individual translator will be able to establish relevant 
links to his or her working environment. We start by 
looking at the nature of translation and the relation- 
ship of the field of terminology to translation, in turn. 
Then we consider issues related to variant realization 
of concepts in different communicative situations. 
This leads into a discussion of collocation. The no- 
tion of sublanguage is seen to be highly relevant with 
respect to terminological variation and collocation 
and we thus next discuss why awareness of 
sublanguage patterns is necessary in translation. Our 
interim conclusion is that there are no available 
terminological resources that offer the translator in- 
formative support in relation to term variation and 
collocation or to sublanguage phraseology. 

We then turn to consider which tools and tech- 
niques are available commercially or in the process 


explicitly mark such information — usually, they 
simply give lists of quasi-synonymous target words 
with few distinguishing marks. This is why transla- 
tors often need to look at monolingual dictionaries in 
both source and target language where they find 
much more information on words. However, such 
information is still largely limited to context-inde- 
pendent senses. 

What we have just described is a 2 model 
of translation in which chaice of wordforms is seen 
to be conditioned by the values of largely global text 
or translation parameters, derived from the specifi- 
cation. This furthermore implies a very strong lexical 
basis to translation and, moreover, essentially an 
atomic one, in which we combine atoms according to 
our parameter values. Matters are much more com- 
plex, in reality. In particular, local textual or 
translation conditions may apply; or individual words 
may themselves condition the selection of other 
words; or it may be apparently impossible to select 
one word without selecting another, and vice versa. 


: Thus we are faced with a complex interplay of global 


and local values. This again is a simplifying assump- 
tion, however it will serve for present purposes. 

. Collocations, phraseological expressions of many 
kinds, idioms and the like are the typical result of 
local conditions applying. This, as we know, is a 
thorny area for the translator, who may manage to 
construct an apparently reasonable translation, in 
respecting global parameter values, but then finds 
the result is not as elegant or idiomatic or natural as 
might be desired. 

We thus find that we need more than information 
about individual wordforms to help us translate: we 
need also information about the combinatorial possi- 
bilities of wordforms. We need much other 
information too, but we shall consider only these two 
types of information here. 

The following question then naturally arises: to 
what extent does the field of terminology give 
support to translators interested in choosing 
wordforms according to global or local parameter 
values and according to combinatorial possibili- 
ties? With this question in mind, we now turn to 
consider terminology. 


Terminology ۱ 


Terminology is often discussed from the point of 
view of determining the concepts of some special 
language and establishing equivalence links between 
the concept systems of corresponding special lan- 
guages of different natural languages. Furthermore, 
with its tendency to strive for harmonization and 
normalization in the interests of more efficient com- 
munication, terminology is often concerned with 
establishing names for ccncepts such that, within 
some subject domain, in some language, there is 
ideally but one name for one concept, or alterna- 
tively one preferred name. Moreover, terminology 
places great emphasis on nominal forms. 
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does, involve modification of the linguistic and com- 
municative parameters given by the source text to 
produce a related target language text where the 
relationship is not a direct orie. We may say that a 
derived message is produced, when the specifica- 
tion calls for some change with respect to the 
parameters of the source language text: the nature 
of the target does not depend directly and entirely 
on that of the source — a different, but nevertheless 
related, message is produced. Sager’ expounds this 
contemporary view of translation as mediation and 
modification. 

Today, translation is increasingly concerned with 
choosing how best not only to mediate but also to 
modify a message with respect to the specification. 
This implies that translators are constantly faced 
with a major problem of choice, covering numerous 
parameters, each of which, in the worst case, may 
call for a choice to be made among a large set of 
possible values. We do not claim there is some ideal 
choice to be made or specified which will result in 
the perfect translation: there is room for much lati- 
tude in any mediation which does not involve strictly 
laid down forms and modes of expression approach- 
ing the formulaic and artificial ends of the language 
spectrum where there can only ever be one possible 
translation. In the general case, then, the translator is 
constrained to choose values from some set of pa- 
rameters in order to yield an appropriate target text. 
This set of choices has a direct bearing on how the 
translator textually realizes information at all lin- 
guistic levels: pragmatic, semantic, syntactic, 
morphological, phonological and graphological. 
Given that the translator must produce, at the end of 
the day, a string of wordforms in the target language, 
it is evident that the choice of each wordform (and 
indeed the dynamic construction of complex target 
forms by the translator) is potentially conditioned by 
the set of values that has been identified as the most 
pertinent with respect to the specification. 

This furthermore implies that, in order to be 
candidates for selection, target wordforms must have 
particular types of information associated with them. 
We would naturally expect to judge whether some 
target wordform is appropriate by consulting its dic- 
tionáry entry and determining the (perhaps implicit) 
set of values associated with it and the closeness of 
the match between this set and the set of values 
derived from the specification. 

Traditional bilingual dictionaries offer a certain 
amount of support for dependent translation, as there 
is an assumption in such dictionaries that text param- 
eters are preserved (or rendered essentially context- 
neutral), in relation to the main sense equivalents. 

However if we are carrying out a derived 
translation, what aids are there to tell us that for 
example target wordform X is the appropriate trans- 
lation of source wordform Y as we are for example 
going from full text to abstract, or report to persuasive 
marketing document? Bilingual dictionaries do not 
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Variant forms 

Although theoretical terminology studies largely ig- 
nore contextually-conditioned variant forms of terms, 
it is not true to say that traditional terminological 
information systems make no attempt to deal with 
variants of terms. Let us briefly review the major 
means employed in terminological resources. 

First, we note that the information categories 
described below do not necessarily all occur in some 
term resource. There is also variation in the interpre- 
tation of each category among term resources. Lastly, 
we may find that, in some actual resource, categories 
are non-exclusive: the same or similar information 
may be given under more than one category. 


Context: 

There are many kinds of context. Our interest here is 
in the kind of context which attempts to capture a 
typical or a-typical use ofa term, and/or shows (types 
of) words that typically occur with the term. 


Scope note: 

This narrows down the area in which a concept is 
typically used, e.g. by indicating it is used only with 
reference to a particular device and is not a concept 
found in all devices of that general type. 


Usage note: 

This potentially contains a wealth of information, 
which supplements that found in the context field. 
Here we typically find details on level of language 
(colloquial, formal, etc.), in which circumstances use 
of the term is mandatory, whether the term 1s stand- 
ardized or not, whether it is specific to some company, 
in which geographical or linguistic region it is used, 
whether it is a translation equivalent thus deprecated 
in source monolingual use, and so on. 

Synonym: 

This is usually a reference to another headword 
considered to be substitutable for the term under 
study. Such substitution may be qualified according 
to context of use. This context may, or may not, be 
clearly set out. If there is no indication of context, the 
assumption then is that the synonym is substitutable 
under the same conditions as the entry term. 


Source origin and type: 

These two categories can provide useful information 
on appropriateness, e.g. by indicating a term as being 
in use at a certain time period, by demonstrating its 
use in documents issued by a professional or govern- 
mental body, etc., by giving evidence of use in a 
certain type of text. Source type information is of 
particular help, all the more so if such information is 
attached not just to the term itself but also to contexts 
and definitions. 


Variant: 

This category usually covers the narrow area of 
orthographic variants, i.e. noting differences in 
spelling of the entry term. 


Abbreviation: 
Here we may find diverse forms such as abbrevia- 
tions, acronyms, symbols, formulae and the like 
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Given such preoccupations, there is a tension 
between the concerns of terminology, as classically 
perceived, and those of translation in an environment 
where choice of wordforms is an important consid- 
eration. It would appear that terminology is concerned 
with constraining choice, whereas translation ex- 
pects constantly to make choices. One might argue 
that terminology helps by offering the translator the 
right choice, for some defined situation, once the 
translator has chosen the particular situation: but to 
what extent is this the case? 

At this point, we must be careful to distinguish 
between theoretical aspects of terminology and prac- 
tical ones. The practical side of terminology is 
supported by terminology information systems such 
as term banks, terminology management packages 
and the like. These mostly do not reflect contempo- 
rary terminology theory. Progressive terminology 
theory is interested, among other things, in represen- 
tational issues, such as the design of terminological 
knowledge representations, and in determining the 
number, nature and role of terminological relation- 
ships used to link concepts in such representations. 
There are very few such knowledge based systems 
around. There are considerably more concept-ori- 
ented systems in existence, which handle simpler 
hierarchical structures based on just a few termino- 
logical relationships, and large numbers of 
term-oriented systems which essentially establish 
relationships between terms as opposed to between 
concepts. The latter can get by without proper defini- 
tions, often preferring contextual examples, whereas 
definitions play a key role in concept and knowledge 
oriented systems, as they help fix concepts in con- 
ceptual space. 

Concept representation issues are highly impor- 
tant for terminology and it appears that knowledge 
based systems offer greater information possibilities 
to the user through allowing exploration of concep- 
tual space, than do less sophisticated systems. 

Currently, there is great interest in exploring 
multidimensionality in concept systems (Bowker and 
Lethbridge’). That is, how one may view some con- 
cept as belonging to more than one relational system 
at the same time. Thus, a-computer operating system 
can be viewed as single- or multi-tasking, as portable 
or hardware-dependent, and so on. 

Such work, however, does not call into question 
the underlying tendency in terminology to prefer one 
form of a term to all others for some concept. In a 
sense, it does not need to, as its concerns are of a 
different nature. However, it is interesting to note 
that the representational and retrieval mechanisms 
typically found in modern terminology knowledge 
bases are eminently suited to handling just the kind 
of issue that is one of the subjects of this paper: the 
existence of different forms to represent what is 
essentially the same concept under different condi- 
tions, within the same subject domain. We now look 
at this topic. 
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It could in fact be argued that the terminological 
resources available to the translator are little better 
than those conventional bilingual dictionaries which 
give sets of ‘synonyms’ for the translation equiva- 
lent, with very little, 1f any, contextual or pragmatic 
usage indicators. The translator is thrown upon his or 
her own knowledge of the target language and its 
contextual and pragmatic possibilities — even if the 
target language is the mother tongue, the individual 
cannot be expected to have complete and instantly 
recallable knowledge of all contextually and prag- 
matically determined variations and moreover be 
consistent in the applicatian of this information. 

In the majority of term resources there is then 
inevitably a lack of detailed and easily accessible 
information about contextual and pragmatic condi- 
tions governing the appearance of terms in texts. 
This is due partly to design and cost factors, and to 
the requirements term rescurces were built to meet, 
and partly to the hitherto entrenched view among 
theoreticians (where these had any influence on the 
design of some term resource) that a concept should 
have only one preferred realization no matter the 
communicative situation. Thus, even if a term re- 
source manager might wish to incorporate pragmatic 
or contextual information, there has been very little 
applied research or theoretical work that could aid in 
the satisfaction of that wish. 

Ironically, translators have always been aware of 
the need for information that would help in decisions 
about appropriateness. However, they have been ill- 
served by terminology theory in that respect (although 
well-served in others), ill-served by published bilin- 
gual dictionaries and rather confusedly and 
frustratingly served by term banks: frustratingly be- 
cause the information they seek may (or may not) be 
there, somewhere, but is difficult to track down, and 
confusedly because there is little apparent concern 
over the nature and type of information that is re- 
corded in what may well be seen as rather ancillary 
fields of information. 

There are certainly other areas which are much 
underresearched and which are hardly treated in term 
resources: a good example is that of expanded or 
reduced forms. Little work has been done on examin- 
ing under which circumstances terms appear in 
expanded or reduced form. There are numerous types 
of reduction: a form may become reduced over time, 
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which can stand for the entry term. Some termino- 
logical resources have separate fields for these forms. 
Expanded and reduced forms: 

These categories typically concern multi-element 
terms (compounds) and give full(er) forms of the 
entry term (elements added) or various attested — or 
indeed potential — shortened forms (elements re- 
moved). 

It is important to realize that not all resources 
provide all the above categories. What is more im- 
portant to realize is that, where such categories are 
provided in whatever measure, there may be few 
dependency links between the categories. Examples 
of dependency links are: 

synonym <~—> usage note}; 

entry term <—> abbreviation «— source type 

<—> usage note}; 

source type <—> expanded form, etc. 


Let us take the example of 

entry term «—— abbreviation <—> source type 

<—> usage note. 
Ideally, we would require source type and usage note 
information to be attached to both the entry term and 
to each abbreviation (in its widest sense) recorded: 
we need to know under which circumstances we can 
use some abbreviated form to replace the main entry 
term. Too often, such dependency links are not in 
evidence, which means that the user is forced into 
wider searching in the resource in order to discover 
(or not) the information sought. The lack of such 
links and indeed of certain categories may be due to 
several factors: the initial design of the resource; 
particular original requirements; and so on. 

Furthermore, simply because a category exists 
does not guarantee it will have adequate (or indeed 
any) information in it. Such information is time 
consuming to discover and compile. Many term re- 
sources contain fullish information for only a limited 
number of categories, whose recording keeps staff 
more than fully occupied. Other categories are filled 
out on an ad hoc basis. ` 

We should also take note of the heterogeneous 
nature of information in certain categories. The us- 
age note is a case in point. Information is furthermore 
typically expressed in free text, essentially according 
to the whim of the terminologist. Ad hoc classifica- 
tions of, for example, register or linguistic region, 
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‘This activity is destroyed by an anaphylatoxin 
inactivator which digests the arginine.’ 

‘An anaphylatoxin inactivator occurs naturally in 
human serum.’ 

‘The disease tends to remit and to respond to 
chemotherapy. 

"The vessels become plugged with thrombi and 
there is exudation of fluid rich in neutrophils into 
the surrounding tissues.’ 

‘These drugs interfere with the normal metabolic 
processes.’ 

" Aggressins interfere with normal defence mecha- 
nisms.’ 

‘An antigen is a molecule that elicits a specific 
immune response when introduced into the tis- 
sues of an animal.’ 

‘The normal tuberculin test skin reaction cannot 
be elicited.’ 

‘This hapten binds to the antigen binding site and 
bonds to amino acid residues.’ 


We also note collocation between adverb and verb: 
‘Compounds of aluminium strongly adsorb 
protein antigens.’ 

or between adjective and noun: 

‘rapid death’, ‘slow death’, ‘lowered resistance’, 
‘slow infection’. 

Note that there can be restricted variation in colloca- 

tion, as we see in the following: 


subcutaneous administration 
intradermal X injection 
intravenous introduction 


Collocation in special languages can be mark- 
edly different to general language and indeed involve 
syntactic structures that are apparently deviant with 
respect to the general language: 

‘On Monday mornings, cotton and flax workers 

present with byssinosis.’ 

Here, not only do we have a special collocation 
involving ‘present + disease/ condition/ symptom’, 
but also a special construction ‘present + with’. This 
is one of the most frequent collocations in medically- 
related texts where the initial state of patients is 
being discussed and therefore the translator would 
wish to employ it, rather than generating ‘the work- 
ers appeared with/ manifested the symptoms of/ 
came looking for treatment for' etc. 

As many of the collocates of terms (verbs, adjec- 
tives, adverbs) are not themselves considered terms, 
they will not appear in term resources in any useful 
sense — they may appear by accident rather than by 
design, in contexts or other notes, but one would not 
be able to see an extended set of collocates: think of 
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element) is always preserved, no matter what other 
reductions are performed. However this is not neces- 
sarily always the case, as we see in the two following 
examples, where the heads (‘ratio’ and 'plate") are 
omitted in the reduced forms: 

‘carrier-to-noise ratio’ ‘maintenance access cover plate’ 
‘C/N ratio’ ‘maintenance access cover’ 
‘CN? 


Collocation 

Up to now, we have discussed mainly problems 
associated with individual terms: choice of which 
form is the most appropriate given certain settings 
for text type, register and so on. We have thus viewed 
terms as isolated objects whose occurrence is condi- 
tioned by various global or local factors. 

However, the form a concept takes can be de- 
pendent on the co-occurrence of some other term(s) 
or word: here we enter the realm of what is broadly 
termed collocation. 

Collocation is pervasive in language: letters are 
delivered, soup is eaten and not drunk, perfume is 
worn, tea is strong and not powerful, and so on. 
Linguists have long been interested in collocation 
(especially British Linguistics — see e.g. Firth’, 
Halliday^, Cruse?). There has been much recent work 
on collocation, especially in computational lexicog- 
raphy and computational linguistics. Investigations 
of large collections of general language texts have 
shown how important a knowledge of collocation is 
for any language user. Church et al.® discuss, for 
example, the similarities and differences in sense 
between collocations involving ‘strong’ and those 
involving ‘powerful’. These two words are often 
defined in terms of each other, yet one cannot simply 
replace one with the other in combination with some 
other form, for the most part. In the same spirit, 
Biber’ looks at collocations involving ‘certain’ and 
‘sure’. 

Translators are always searching for ‘the right 
way of saying something’ — for the right collocation, 
we might say, in many instances. It is more than a 
question of being terminologically accurate, it is also 
a question of formulating a sentence or phrase so that 
it sounds as if it belongs to the type of language under 
study. As in general language, so in special lan- 
guages, in fact even more so: special languages 
(sublanguages) have remarkably high incidences of 
collocation, as 15 apparent from a brief scan of any 
special language collection. The collocations are 
here highly distinctive in characterizing the lan- 
guage of the field. 

Collocation is often seen between verbs and nouns. 
Here are some examples, drawn from a collection of 
immunology texts, where one may easily spot 
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niques which will deliver, as collocations, among 

otherthings, somewhat trivial combinations of words, 

or combinations of trivial words. We might find 
combinations which have low frequency of occur- 
rence yet are still highly significant as collocations. 

Entire phrases, sentences and paragraphs have 
also been treated by some as collocations. This links 
in with our comments on collocation in sublanguage 
where we see highly syntactified structures are pre- 
dominant. As soon as we expand our view of 
collocation from, say, pairs of words to numerous 
words combining and cooccurring, it is then a small 
step, in sublanguage texts, to consider entire text 
units as collocations. However, we should be careful 
to distinguish between: 

1. fixed phrases such as idioms; 

2. formulaic repetition in certain text types dealing 
with certain subjects, where certain pieces of in- 
formation are relatively constant; 

3. instances of sublanguage specific sentential or 
phrasal templates. 

We have nothing to say, here, regarding idioms. 
Examples of formulaic repetition include regularly 
issued bulletins where, for example, titles, captions 
and whole pieces of other text stay constant for every 
issue at some time period, although they appear to 
change from day to day or hour to hour: 

"Weather report for the 12 hours ended 6pm 

Tuesday' 

“Weather report for the 12 hours ended 6am 

Wednesday'. 

Such repetition is also v/ell known in, for example, 
administrative and legal texts. There is however a 
matter of degree of mandatoriness to consider: a for- 
mulaic statement that is changed (for example in 
translation) in a weather bulletin may cause less of a 
problem than one changed in a legal document. This 
indicates that text type, subject matter, content and 
intention must be taken into account when one wishes 
to say anything meaningful about formulaic repetition. 

As for instances of sublanguage specific sentential 
or phrasal templates, here we may refer once again to 
the syntactified nature of lexical selection in 
sublanguages. Certain combinations crop up again 
and again. Above, we considered all of these as 
collocations. However, if we continue to do so, we 
will miss valuable generalizations: we will miss the 
patterns inherent in the text. Work in this area has 
been going on for many years, largely ignored by the 
natural language processing community. For exam- 
ple, medical sublanguage has been the subject of a 
long-term effort at New York University (Sager'®). 
Numerous patterns and templates have been identi- 
fied and used to build information systems and other 
natural language applications. These patterns are 
typically centred on verbs and have thus much in 
common with attempts by general linguists to de- 
scribe the argument structure or frames of verbs. 
However, the potential number and nature of the 
verbal arguments in sublanguage frames are often 
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to be preferred, but in a message from consultant to 
hospital trust board member, the expression 'sought 
treatment for' might be preferable? 

One might say there is no substitute for experi- 
ence and extensive knowledge of the subject domain, 
but every translator needs relatively more or less 
help with choosing or indeed identifying appropriate 
collocations at times throughout his or her career. 

Frawley’ neatly sums up the nature of sub-lan- 
guage, showing the key contribution of collocation: 
1. sublanguage is strongly lexically based; 

2. sublanguage texts focus on content; 

3. lexical selection is syntactified in sublanguages; 

thus 

4. collocation plays a major role in sublanguages; 

5. sublanguages demonstrate elaborate lexical 
cohesion. 

The particular structures found in sublanguage 
texts reflect very closely the structuring of a 
sublanguage's associated conceptual domain. It is 
the particular syntactified combinations of words 
that reveal this structure. Techniques which allow us 
to establish measures of association between the 
wordforms of sublanguage texts can reveal much 
about collocational behaviour and semantic classes. 

The lesson to be drawn from study of sublanguage 
texts is not only that collocation is important, but that 
it is essential as a communicative device: it carries 
greater communicative weight than in general lan- 
guage. 

Up to now, we have avoided defining what we (or 
others) mean by 'collocation'. Unfortunately, defini- 
tions of collocation are numerous and varied. Some 
researchers include multi-element compounds as 
examples of collocations; some admit only colloca- 
tions consisting of pairs of words, while others admit 
collocations consisting of up to, say, five or six 
words (there may be intervening material); some 
emphasize syntagmatic aspects, others semantic as- 
pects. The common points regarding collocations 
appear to be, as Smadja? suggests: they are arbitrary, 
they are domain-dependent, they are recurrent and 
lastly the occurrence of one word (or more) strongly 
influences the occurrence of others. 

It is not a goal of this paper to offer yet another 
definition of collocation. However, what we can 
observe is that there is, from a terminologist's point 
of view, little advantage to be gained in viewing 
multi-element compound terms as collocations: they 
are terms, with all that this implies. The fact that their 
elements may occur in combination may be useful as 
one of the guides to recognition of unknown com- 
pounds; however to characterize multi-element 
compound terms as collocations is, in our opinion, to 
fail to recognize their special nature as terms. Multi- 
element compounds may however be quite well 
characterized as collocations in general language: 
but that is the subject of a different paper. 

In our view, there is equally little to be gained 
from applying straightforward frequency based tech- 
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this further means that sublanguage texts tend to use 
the same means to talk about the same things a lot of 
the time: to the general linguist, this then gives the 
impression of collocation at work. However, we may 
go a step further and say that there are underlying 
patterns and templates at work that characterize the 
semantic, conceptual nature of sublanguages. Such a 
realization then allows us to collapse whole series of 
apparently different patterns into similar patterns: at 
the collocational level, such generalization would be 
missed. For example, in the NYU research referred 
to above, words (largely terms) were grouped suc- 
cessfully into word classes, which then give rise to 
semantic classes that can be used to build frames. A 
simplified example is the ‘General Medical Manage- 
ment' frame: 

{INSTITUTION PATIENT MANAGE VERB] 
where 

INSTITUTION has as members: 'cardiology', 

‘clinic’, ‘casualty’, ‘hospital’, ‘lab’, ‘outpatients’, 

.. PATIENT has as members: ‘patient’, ‘pt’ 

(abbreviation), ‘she’, ‘he’, ... MANAGE VERB 

has as members: ‘admit’, ‘diagnose’, ‘discharge’, 

‘evaluate’, ‘transfer’, ... 

This then allows one to recognize or synthesize 
sentences such as: 

‘patient was admitted to hospital’ 

‘pt was transferred to outpatients’ 

and so on: the kind of phraseology that occurs 
time and again in medical reports and that should 
thus be desirably reflected in a translation. 

In the foregoing sections, we have examined 
issues of choice concerned with terms: we have 
explicitly or implicitly considered choice at the 
authoring stage as well as at the translation and 
generation stages. We have seen that choice, in 
sublanguage texts, involves being aware of commu- 
nicative context, of text type, intention, translation 
specification and so on. A concept may be realized in 
different forms, related or not to some base form, 
depending on such factors of the translation environ- 
ment. We have also seen that the choice of certain 
terms and their syntagmatic positioning can be and 
often is highly dependent on the occurrence of other 
terms. This latter phenomenon can be approached as 
involving collocation; it is a special kind of concep- 
tual collocation, though, as there are clear underlying 
patterns detectable, which allow us to describe, in 
abstract form, many apparently different colloca- 
tions as manifestations of a few (often simple) 
constructions. It is this abstract view that, in turn, 
allows us to choose, within acceptable parameters, 
appropriate surface realizations. In other words, we 
can know the preferred modes of expression and be 
able to introduce variety in our target text: variety 
which remains within the bounds of acceptability 
with respect to the particular sublanguage we are 
using. 

It is all very well to discuss notions of choice - 
this 1s a topic about which translators need to be told 
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quite different from those discussed by general lan- 
guage linguists. It is often argued by detractors that 
medical sublanguage lends itself remarkably well to 
such interpretations. However, we find the same 
kind of patterning in other sublanguages. For exam- 
ple, a satellite telecommunications verb frame might 
be: 
TRANSMIT[SIGNAL_SOURCE,SIGNAL,FRE- 
QUENCY,SIGNAL_DESTINATION,MEDIUM] 

‘The satellite transmitted a test signal on 100MHz 

to the ground station through free space.’ 

A verb frame for the same sentence constructed 
by a general linguist might be: 

TRANSMIT[AGENT,PATIENT] (where patient 

here is to be read as ‘entity undergoing some 

action") 

Thus, for a general language linguist, all the 
prepositional phrases would constitute adjuncts (or 
circumstancials) which would be seen as having 
little central role to play, whereas for the sublanguage 
specialist, the prepositional phrases are critical: they 
are arguments (perhaps optional, perhaps not) of the 
sublanguage verb and serve to indicate links be- 
tween concepts governed by the verb transmit. Note 
that the following would never be construed as a 
sublanguage usage of transmit, even though it has 
apparently the same structure, the same functions 
and dependencies: 

‘The European Court transmitted its brief opin- 

ion to the British Government on a low-loader 

lorry through Belgium'. j 

One might reasonably detect a collocation here 
(‘transmit an opinion’), however the difference be- 
tween the two sentences lies exactly in the 
terminological density and patterning of the 
sublanguage sentence, where domain concepts are 
linked together by the sublanguage verb to form a 
meaningful conceptual statement: meaningful in the 
sublanguage. Note also that variation in sublanguage 
sentences is usually quite restricted: 

? “It was through free space that...” 

? "Tt was on 100MHz that..." 

? “To the ground station, through free space, was 

transmitted..." 

Also, sublanguage nonsense can be obtained 
which might appear quite acceptable in general lan- 
guage: 
sublanguage sense: ‘We washed the polypeptides in 
hydrochloric acid.’ 
sublanguage nonsense, general language sense: ‘We 
"washed the hydrochloric acid in polypeptides.’ 

This example is due to Harris", who points out 
that we cannot exclude the second sentence from the 
general language, where some metaphoric meaning 
could be intended. However, this sentence would 
never occur in sublanguage texts. 

Such restrictions, not just syntactic but also lexi- 
cal, morphological, etc., mean that there are fewer 
possibilities for expression in sublanguages. Com- 
bined with great conceptual, terminological density, 
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Translation memory systems 

We will now look in greater detail at a more sophis- 
ticated type of tool that has been available on the 
market for some time now and consider the help that 
translation memory systems can offer. 

Translation memory systems have become popu- 
lar recently, with several systems on the market. 
These rely on the existence of previously translated 
texts i.c. on corresponding source and target texts. 
They firstly process these data in order to, for exam- 
ple, align phrases and establish links between 
corresponding words and phrases (one may also, 
typically, build up a translation memory incremen- 
tally as one goes about translating). They exploit 
pattern matching techniques to discover phrases in 
their structured memory which are identical or close 
to some phrase the translator has selected in the 
source text he or she is working on, and then display 
the associated translations. The translator can then 
choose to incorporate what is offered, or not. Maxi- 
mum benefit is gained when texts being translated 
are highly similar to previously seen texts. This is the 
case for successive versions of a manual for some 
device, for example, when the bulk of the material 
does not change from version to version. Systems 
often offer, in addition, integrated terminology man- 
agement packages. Systems on the market include 
TM/2 (IBM), Eurolang Optimizer (SITE/Eurolang) 
and Translator's Workbench (Trados) with their at- 
tendant utilities. To what extent can translation 
memory techniques help with variant term choice or 
collocation and phraseology choice? Insofar as one 
is able to look at patterns in one language and 
translationally (partially) equivalent patterns in an- 
other, they do indeed help. If one is able to specify 
detailed control information for archive texts (type 
of text, author, date, company, subject domain), then 
such information can be used to impose a ranking on 
retrieved matches. If the system can exploit an asso- 
ciated terminology resource, then possibly the 
translator can browse through variant term forms for 
both source and target language segments. One can- 
not, however, as yetexpecttoo much of such systems. 
It appears that integrated terminology management 
packages are used more often than not by translators 
themselves to record term-term correspondences that 
the translation memory proper has not yielded. 

Furthermore, one must be careful in distinguish- 
ing between statistically or probability based pattern 
matching and linguistic or interpretative pattern 
matching. A translation memory has no knowledge 
that a form may be a term, unless it is told so (e.g. via 
explicit annotation in an associated terminology re- 
source — these might more properly be called 
wordform resources). All it sees is patterns standing 
in some alignment relation; it can determine close- 
ness of match to some given string according to 
probabilistic, statistical and positional information. 
The selection ofthe string to match is in the end up to 
the user: a string can be any arbitrary sequence of 
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few facts. We may have helped shed some theoreti- 
cal and practical linguistic light on certain issues, to 
do with terminology and sublanguage. However, 
what is of key interest is: how can one discover and 
exploit such information in practical ways? What 
tools and resources are there to help the translator 
make the appropriate choice in some circumstance? 
We turn now to consider these points. 


Tools and techniques 

First ofall, we may note once more that there is a great 
lack of information of the kind we have been discuss- 
ing — there are vanishingly few lexical resources that 
store such information in a formal, easily searchable 
and retrievable way. We have seen that most term 
banks cannot help out. Contextual variation in termi- 
nology is not handled well in term banks. As for 
collocation, we have in fact come across only one 
term bank which has been explicitly constructed to 
store collocational information in a formal way, for 
multilingual purposes. This is a term bank at Krupp 
Industrietechnik GmbH, based in part on Hausmann's 
theory of collocation (see Freibott and Heid for a 
description of this bank and e.g. Hausmann"). 

If the translator cannot find the information in 
term banks or dictionaries then he or she must look to 
means to enable him or her to discover such informa- 
tion, to tools or techniques that can be directly or 
indirectly used. In the last few years, large scale 
processing of texts, in the form of ad hoc collections 
or deliberately designed corpora, has become wide- 
spread in computational lexicography and natural 
language processing. The reasons for this need not 
detain us. User requirements of corpora for natural 
language processing are discussed in McNaught’. 
Briefly, in order to build better natural language 
processing systems or dictionaries, we need to proc- 
ess large bodies of text to discover facts about 
language. Much of this work is done by applying 
various tools, mainly relying on statistical and prob- 
ability-based techniques. 

We will now look at several types of tool and 
techniques. Of necessity, our discussion will be brief: 
our aim is not to offer an exhaustive catalogue of 
potentially useful tools, but to draw the attention of 
translators to types of tool that offer help with prob- 
lems of term choice, collocation and sublanguage 
phraseology. 


'Key Word In Context' tools 

A clearly useful type oftool, of which there are many 
instances on the market, is that which produces a Key 
Word In Context (KWIC) output. Such tools have 
been around for many years and form a standard 
utility for anyone interested in processing text to 
discover, in a limited way, lexical and collocational 
regularities and associations. In passing, we mention 
that inverse KWIC tools also exist: by a simple 
transformation, these show different words that ap- 
pear in the same context. 
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out and pull such tools off the shelf, we nevertheless 

discuss them here as: 

e translation and terminology organizations may be 
interested in commissioning implementations of 
the more straightforward techniques; 

e several projects throughout the European Union 
and in the USA have been launched to provide 
various kinds of corpus exploration tools. 

Regarding the latter, these are as yetatearly stages 
of development but we should see the results of these 
projects being eventually commercialized. Thus, it is 
good to know in advance of their existence. The EC in 
particular has been instrumental in supporting re- 
search into the development of corpus processing 
tools, in the framework of the Linguistic Research and 
Engineering programme, run out of DG XIII in Lux- 
embourg. Cencioni and Klein’! give synopses of current 
LRE projects, which are conducted on a collaborative 
basis between industry and academia, sponsored by 
the EC. The most relevant of these projects in the light 
of our topic are: DELIS, COMPASS, MULTEXT, 
TRANSLEARN, TRANSTERM and GIST. There 
are numerous other LRE projects dealing with other 
aspects of language engineering that may equally 
interest the reader. In the UK, the Speech and Lan- 
guage Technology (SALT) programme ofthe EPSRC/ 
DTI has supported collaborative industry-academia 
projects such as the British National Corpus Initia- 
tive, ACRONYM (collocation retrieval ofthesaurally 
related items) and DRAFTER (assistant for technical 
writers to produce drafts in English or in French). 
The reader is advised to contact Dr Peter Lee, De- 
partment of Trade and Industry, 151 Buckingham 
Palace Road, London, UK, SW1W 955 for further 
details of these projects. 

In the USA, there is a major corpus project at the 
University of Pennsylvania, which as well as build- 
ing corpora is developing numerous tools to explore 
them (Marcus et al.'’). Almost every corpus project 
is engaged in building tools to process their texts, 
there being few suitable tools on the market. 

The spin-off from all these projects should there- 
fore be significant in terms firstly of tools to explore 
corpora or text collections and, eventually, gram- 
mars, dictionaries, resources and full-blown natural 
language processing systems and other aids built on 
the results of all this corpus work. 

Our discussion will remain general as we wish 
rather to point out potentially useful techniques. 
Fortunately, the contribution by Erlandsen in this 
volume describes one of the few commercially avail- 
able tools able to offer flexible means of processing 
data to yield substantial information on cooccurrence 
phenomena. Hopefully, our comments will enable 
the reader to appreciate the general nature of the type 
of technology involved in such tools. 


Toois and techniques for collocation 


As we have intimated, much research has been going 
on in this area recently, inspired mainly by statistical 
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characters, in effect. Also, the system has no real 
knowledge about the nature of the relationship 
between source and target segments, beyond the fact 
that one has been used as a translation of the other. 
This is not to say that the target segment is an 
appropriate translation in any way. If, for the sake of 
discussion, previously translated texts were translated 
by a person who had little knowledge of the termi- 
nology and phraseology of some sublanguage, then 
it will be segments of the result of such translation 
that the user will see. In such a situation, if the user 
has likewise little knowledge of that sublanguage's 
terminological and phraseological behaviour, then 
she will get no true help from the system and, if he or 
she accepts what is proposed, will merely propagate 
an inappropriate translation. Thus, the usefulness of 
translation memories is directly linked to the quality 
of the previous translations they are dependent on. 
We certainly do not deny their clear utility, we 
merely point out that, if one is using one's own 
previous translations as a source of information, this 
will be helpful only if one is happy with the quality of 
one's previous work and has some means of ascer- 
taining its appropriateness to the task in hand. 

This is not so much a criticism of translation 
memories, which can indeed help greatly in the 
translation task. It is more a reminder that the func- 
tionality of any tool must be carefully studied with 
reference to the translation environment it is being 
considered for (and also that existing environments 
could well bear re-appraisal in the light of tool 
functionalities on offer). For those who are inter- 
ested in adequacy evaluation of translation memories 
and indeed translators’ aids in general, we recom- 
mend study of EAGLES". The EC sponsored Expert 
Advisory Group for Language Engineering Stand- 
ards is working on promotion of de facto standards in 
a number of areas, including adequacy evaluation of 
translators' aids. At the time of writing, a substantial 
draft report is available, for comment by and feed- 
back from the community. It is intended to publish 
recommendations for de facto standards in the Au- 
tumn of 1995. For further information, the reader 1s 
advised to contact the EAGLES Secretariat, 
Consorzio Pisa Ricerche, Piazza A. D’Ancona 1, 
56127 Pisa, Italy. 


Prototype tools and techniques 

In the following sections we shall discuss tools that 
have not as yet appeared on the market: the tech- 
niques on which they are based are either of recent 
date or have been recently adopted and adapted from 
other areas, chief among which is the area of infor- 
mation retrieval. In many cases, we are thus dealing 
with prototype systems or with techniques which 
could be usefully applied to our problem-area after 
further development. However, other techniques 
could find rapid application with a minimum of work 
by a competent programmer. Thus, while we realize 
the translator or terminologist may not be able to go 
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audience in the interests of encouraging wider use of 
these techniques. 

Even more interesting results may be obtained if 
the objects being compared are words labelled with 
parts of speech — automatic morphosyntactic tagging 
of large text collections is an entire activity in itself 
that we gloss over here: most of the corpus-related 
projects we have mentioned are developing or have 
developed such tools. Previously, we were consider- 
ing raw wordforms and thus could not distinguish . 
between, ‘bank’ (NOUN) and ‘bank’ (VERB), or 
‘to? (PREPOSITION) and ‘to’ (INFINITIVAL 
MARKER) for example. However, once we know 
the part of speech of wordforms, we can then pro- 
duce more precise information about the specific 
behaviour of wordforms and be able to distinguish 
between homonyms. 

It is also possible to look for collocations on the 
basis of syntactic structure: there are tools offering 
skeletal parses (syntactic analyses) of texts which 
trade accuracy for robustness and rapidity —as we are 
interested mainly in gross syntactic structure, then 
their output is valuable. We can thus determine, by 
applying statistical techniques on the results of such 
parsers, the typical objects of certain verbs, or the 
typical verbs of certain subjects, and so on. 

Church et al. and Smadja?, among others, dem- 
onstrate how statistical techniques may be combined 
with linguistic information to yield collocational 
information. The methods each use are different. As 
we saw, Church and his colleagues work with combi- 
nations of two words; Smadja’s work, in addition, 
offers the possibility of looking for collocational 
behaviour in combinations involving up to thirty 
words. | 

It should be noted that the two measures we 
mentioned are not panaceas, neither singly nor to- 
gether. Many statistical techniques of this type are 
affected by the sparse data problem, for example, or 
yield certain amounts of rather odd results. We can 
attempt to mitigate these effects by introducing lin- 
guistic filtering, e.g. via tagging text with part of 
speech labels; however such effects will always re- 
main to some extent, depending largely on the nature 
of our texts. It should furthermore be noted that most 
of the work in this area has been concerned with 
processing large scale text collections of general 
language. This is not to deny the usefulness of the 
techniques discussed for terminology: we simply 
warn the reader to be sensitive to the current general 
language orientation of the technology and yet not be 
dismissive of it because of that orientation. 


Term recognition tools and techniques 

In our discussions so far, we have quietly glossed 
over a very important aspect of term choice: how to 
know in the first case that we are dealing with terrns 
as opposed to general language words. In order to 
know that some form is a variant of a term, if we have 
no prior record of it, we must be able to recognize it 
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and probability based algorithms found in informa- 
tion retrieval. 

Favourite techniques involve the use of measures 
of similarity such as Mutual Information, or of dis- 
similarity such as t-score. Church et al. provide a 
clear and informative discussion of the use of these 
two measures in lexicography. Further exposition is 
provided in Charniak!?. Mutual Information operates 
with pairs of words as follows: it considers how 
probably one might come across the pair together, 
how probably one might find each member of the 
pair on its own (without the other, i.e. by chance), 
then compares these probabilities and yields a value 
which denotes the strength of the association. One 
may successfully determine strong associations, un- 
interesting associations and pairs whose members 
are essentially in complementary distribution. One 
may thus rank all combinations of some word with 
all others, determine a threshold value and consider 
associations above that threshold to be relatively 
strong for that word. 

With measures of similarity, it is not so easy to 
determine the difference between two words which 
are close in meaning, by looking at the pairs which 
each participate in. That is, it is easier to find evi- 
dence to support some hypothesis than to find 
evidence against it: it is difficult to determine what 
words do not occur after some given word. We suffer 
from lack of evidence or uncertainty about whether 
our evidence is adequate. Our lack of evidence might 
simply be due to not having processed enough data 
or to having used the wrong technique. Thus, Mutual 
Information has its limits. 

However, we can employ a measure of dissimi- 
larity, such as t-score, to help us determine to what 
degree closely related words differ. This measure 
utilizes the notion of the null hypothesis (i.e. that 
there is no difference): we first compare the prob- 
ability of word X occurring with word Z against that 
of word Y occurring with word Z. Then we ask what 
likelihood there would have been of observing any 
difference between the probabilities if the difference 
had in fact been zero. If we find this likelihood to be 
significantly low (less than 1 chance in 20) then we 
can reject the null hypothesis. l 

Mutual information and t-score give different, 
but complementary, results. They can only be used to 
examine the association between pairs of words; 
however they can nevertheless give very useful in- 
formation about collocation of not only nouns and 
nouns, or nouns and adjectives, but also, for exam- 
ple, verbs and prepositions. 

Programs to apply these measures are straightfor- 
ward to write, especially in environments which 
offer powerful utilities as standard (as does, for 
example, the Unix (TM) operating system). Church”? 
is a brief tutorial containing short yet complete and 
fully operational programs (rarely over one page) to 
implement these measures and other similar ones, 
which was given to a largely non-computational 
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Text type analysis 

Regarding computational analysis of text type and 
communicative contexts, this too is an area that is 
attracting greater interest from researchers. As yet, 
much of the work is focused on general language. 
Important results in this area are due to Biber’, who 
moreover discusses how his techniques could be 
extended to specialized texts. Translators and 
terminologists can look forward to further develop- 
ments in this area which will directly affect their 
work, as these will provide the means to discover 
information regarding the fünctional nature of text 
types, the communicative role of various modes of 
expression and so on. 


Conclusion 

The reader can thus appreciate that there is much 
research going on into applying various techniques 
to the processing of collections of texts to yield 
information about the behaviour of wordforms. Ex- 
perience gained from using the prototype tools we 
have described will undoubtedly feed into the con- 
struction of commercially available tools to aid in the 
extraction of knowledge about how terms behave in 
context. This can only be to the benefit of translators 
and terminologists. 

Eventually, term resources will hopefully offer 
the means to store, and search for, variant termforms, 
collocations and sublanguage phraseology. How- 
ever, at present one can only indulge in self-help, 
although the tools and techniques described are a 
definite aid and their relevance to the translator 
should not be ignored. 

In closing, we wish to make a final methodologi- 
cal point. There is certainly a strong temptation to 
process paired source language original and target 
language translated text, given that quantities of such 
*parallel corpora' exist. There is apparently an equally 
strong belief that such processing will yield good 
quality terminological data, collocational and phra- 
seological information that will then be of use to 
translators. We are not so convinced of this, as we 
have hinted at earlier, as one cannot be at all sure of 
the quality ofthe translation and particularly whether 
it did indeed respect the target language constraints 
on phraseology, collocation and term choice. Fur- 
thermore, the processing of paired texts.will not help 
overmuch with translation situations where modifi- 
cation of the message 1s called for: as far as can be 
seen, most parallel corpora are of the dependent 
translation type we mentioned at the beginning of 
this paper. 

We believe that, if one wishes to arrive at the best 
possible information on collocational, phraseologi- 
cal and terminological behaviour, it is paramount to 
process original texts in the target language, rather 
than translated texts. The tools and techniques we 
have discussed are entirely usable to this end. Arntz? 
reminds us that, in comparative terminology, one 
does not work with translations, but with original 
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as having terminological status. In order to detect 
special language collocational behaviour involving 
nominal terms and verbs, we need to also know that 
the nouns we are investigating are terms, especially 
if we have no prior information on these forms. It 
might be supposed that the statistical and probabilistic 
techniques we have looked at could help in the 
recognition of terms. To a certain extent they do, 
however offer forms that are terms are not picked up 
by associative techniques, as no evidence can be 
found to propose a strong association among the 
elements of, for example, multiword compound terms. 
Straightforward counting of frequency of occurrence 
can help (on the hypothesis that frequently occurring 
forms should represent the most important concepts 
of specialized texts) but is also misleading: one finds 
elements being proposed as terms that clearly do not 
have such status. 

Recently, there has been an increase in research 
into this entire area. Daille et al.” propose an ap- 
proach combining statistical and linguistic techniques, 
applied to aligned texts (original plus translation), to 
discover compound terms, where the statistical tech- 
niques used are sensitive to both frequency and 
association characteristics of the data. A disadvan- 
tage of this work, as with all such work involving 
aligned texts, is that as the quality of the translation 
is always in doubt, thus results must be interpreted 
with caution. A good overview of the problems of 
extracting multiword compound terms is given by 
Lauriston”. 

The particular form that terms take, in various 
text types and subject areas, is furthermore critical to 
their recognition as terms. Each domain has its pre- 
ferred methods of term formation. It is important to 
have knowledge about term formation possibilities 
and to know how formations may be affected by a 
change of register, of text type, of communicative 
situation and so on. Among the various types of term 
formation are: derivation, compounding, back for- 
mation, borrowing, simile, conversion, compression, 
and so on. Ananiadou” investigates how linguistic 
knowledge of term formation can be used to drive a 
term recognizer. Such knowledge is also highly use- 
ful in aiding the translator or terminologist in the 
synthesis of terms and in helping him or her to decide 
how to realize a concept in some context. What is 
clear from research in this area is that certain types of 
term formation are quite intractable at present, from 
the point of view of trying to recognize them in 
running text. Also, it is clear that even successful 
processing can only hope to propose potential occur- 
rences of terms. Human interpretation must in the 
final analysis be brought to bear to decide whether a 
form is indeed functioning as a term. The aim of 
automatic term recognition is then to attempt to 
recognize all potential, rather than actual, terms in a 
text, hopefully including all actual terms within the 
set of potential terms discovered, while excluding 
forms deemed not to be terms. 
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texts in each of the languages under study, in order to 


determine firstly the conceptual and terminological _ 


system of each language independently and only 
subsequently to establish mappings between these. 
This is a methodology that should be adopted at all 
levels of terminological investigation, thus applica- 
ble to term variants, collocations and phraseological 
behaviour. It is not an easy task to work in this 
manner but the results are bound to be of higher 
quality than if we had worked with translated texts. 
After all, the translator wishes to determine how 
information should be expressed, given some com- 
municative situation, in the target language. Such 
knowledge is really only to be found in original texts 
ofthe target language and original texts of the source 
language. 
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Abstract: 

This article summarizes the main findings of a survey, undertaken in early 1994, of open access CD-ROM in 
British public libraries. The survey examined how well Public Library Authorities (PLAs) were implementing 
CD-ROM technology for public use and how well the general public were fairing with CD-ROMs. The survey was 
both quantitative and qualitative in nature: current national statistics for CD-ROM distribution in PLAs were 
sought, case studies of 13 PLAs who provided open access CD-ROM were conducted and finally an end-user 
survey of 4 of these libraries was undertaken. The principal findings of the survey are as follows. In 1992 only 
5% of PLAs provided CD-ROMs for public use, but by 1994 this figure had risen to 12%. London and English 
County PLÁs had the highest proportion of CD-ROMs for public use. PLAs with CD-ROM services were not 
necessarily the big spending authorities. National newspapers accounted for the majority of CD-ROMs in use. 
The main management concerns were lack of adequate user training and documentation. All PLAs wanted to 
update and expand their CD-ROM services. This matched one of the main demands from users, which was for 
more facilities, but PLAs failed to address the users’ other main demand - the provision of greater assistance. 
The predominant user group was young students. Educational institutions played a significant role in training 
users in the use of CD-ROMs. Most users searched newspaper and business titles. Finally, there was a high level 


this technology — otherwise schoolchildren will be 
showing public librarians how to search computer- 
ized encyclopaedias and other reference works. 


Aims and scope 

The aims of the research were to examine the scale 
and impact of open access CD-ROMs in public 
libraries. A number of questions need to be 
answered: (1) how well, or not so well, had the 
technology transferred to the public library envi- 
ronment?; (2) were public libraries effective at 
exploiting the technology?; (3) were present public 
library users obtaining any benefits from CD-ROM 
services, and especially, were there any signs that 
CD-ROM was changing users’ information-seek- 
ing behaviour? 

To obtain the answers to these questions a 
three-part investigation was undertaken. Firstly, as 
there was very little information available on open 
access CD-ROM in British public libraries, it was 
necessary to establish both the general patterns of 
CD-ROM distribution and the mode of access. 
Secondly, a detailed examination was conducted 
of how public library authorities (PLAs), with 
open access CD-ROM, operated their services. 
Thirdly, the study looked at the end-users, deter- 
mining who they were and what impact CD-ROMs 
had on them. 

The study confined itself to the use of commer- 
cial CD-ROMs in UK public libraries. 
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of user satisfaction with CD-ROM searching. 


Introduction 

After the OPAC, the CD-ROM is probably the 
most widely available and most accessible compu- 
terized information system — and, what with the 
current massive advertising campaign to sell CD- 
ROM to the consumer on the back of multi-media, 
this technology will surely soon overtake the posi- 
tion of the OPAC. The unique feature of CD-ROM 
is its relationship with end-users: CD-ROM is both 
very popular with users and it 1s also a potentially 
very accessible information system. While some 
forms of computerized information retrieval are 
still highly elusive to the end-user — remote online 
retrieval is a case in point - CD-ROM technology 
is well established amongst end-users in academic, 
medical and other specialized libraries. This has 
not escaped the notice of public librarians and 
recently a small group of public libraries have 
begun offering their users direct access to CD- 
ROM databases. Public libraries, however, are 
confronted by many more problems when it comes 
to introducing the technology: they do not have the 
same well-defined user group as academic or spe- 
cialized libraries; nor are public library users 
‘information-trained’ as many academics or stu- 
dents are; and their heterogeneous population means 
that a wide-range of CDs are required to meet the 
users’ needs. But if public libraries are going to 
play a major role in information provision in the 
21st century, then they will surely have to embrace 


CD-ROM in public libraries | 

Not much has been written about CD-ROMs in public 

libraries, most studies concerning themselves with 

academic and special libraries. The results, from a 

project in the US to setup databases in public libraries, 

indicated that CD-ROM offered the *opportunity to 

interact with one ofthe most exciting new resources in 

the realm of information retrieval.’ (Peeling, L. H., 

1990:174). In Sweden a project was set up to intro- 

duce CD-ROMs to the public and see whether or not 

they were suitable for public libraries. The results of 

that project (Wiksten, 1990) revealed that: 

e one third of users were under the age of 16 

e main reasons for use was academic research or 
study 

e most users were students 

e less than half of users questioned found what they 
were looking for, most of whom did not ask for 
help 

e only half of the users asked for staff help 

e most users found the lack of a search language 
standard a problem. 

Only a few articles have been published about 
CD-ROMs in British public libraries. From what has 
been published, we discover that public libraries use 
CD-ROM as a bibliographic tool in a largely refer- 
ence context (Ackeroyd 1990; Foulds & Foulds, 
1991). It is particularly surprising how little statisti- 
cal data there is available. Even The Library & 
Information Statistics Tables (LISU, 1994) do not 
provide figures for the number of CD-ROMs avail- 
able in public libraries. The most useful documents 
are Chris Batt's 4 bi-annual surveys of the use 
information technology in UK public libraries. The 
first survey, not surprisingly, made no mention of 
CD-ROM (Batt, 1985). In the second survey (Batt, 
1988) only two public library authorities were think- 
ing about CD-ROM. By the third survey (Batt, 
1990) 49 authorities had CD-ROM but no indication 
was given about type of access. In the fourth survey 
(Batt, 1992), 101 PLAs had CD-ROM of which only 
9 provided open access. More recently, Shields' 
1993 Union list of CD-ROMs in London libraries, 
showed an increase in the number of London PLAs 
who had open access CD-ROM. 

Though CD-ROM is seen as a popular form of 
information delivery, very few PLAs have a policy 
of open access. Most public libraries do not see open 
access as a high priority, instead staff bibliographic 
use and reference support were the main emphasis 
(Batt, 1992). 


The future 

The literature on whether or not CD-ROM has a 
future is inconclusive. CD-ROM has been called a 
*transient technology' with only a few years of life 
left in it (McSean & Law, 1990). But if the growth of 
commercial CD-ROM titles is anything to go by the 
technology has a big future. The number of titles 
published continues to rise, with up to 6,000 titles in 
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Literature review 

The amount of literature on CD-ROMs is vast and 
overwhelming. The early CD-ROM literature con- 
centrated on how CD-ROMs fitted in with other 
information tools (Harter & Jackson, 1988). CD- 
ROM's main potential was seen as a means for 
end-users to conduct their own computer searches, 
thereby giving the user greater access to information 
(Harter & Jackson, 1988). Reese (1990) found that 
users tended to become enamoured with CD-ROM 
and neglected conventional sources because of its 
quickness and sophisticated search capacities. One 
of Cannel's (1990) key findings was that CD-ROMs 
increased the status of both the library and staff as 
well as encouraged users to access library stock. By 
1993, CD-ROM seemed well established, with end- 
user searching the most popular method of access. 
As the numbers of titles increased, evaluation and 
selection of discs became more important than the 
` way information was presented on CD-ROM (Kinder 
& Preston, 1993). 


User surveys 

User surveys generally indicate that most end-users 

were satisfied with their searches. Though, tellingly, 

other surveys indicate that up to a third of data re- 
trieved was useless (Kenny & Schroeder, 1992). Harter 
and Jackson (1988) found that skill was still needed to 
carry out effective searching. End-users tended to be 
less successful but, they asked, did it matter? Steffey 
and Meyer (1989) found, not surprisingly, that the 
amount of experience affected users’ perceptions of 
how useful and easy to use CD-ROMS were. LePoer 
and Mularski's survey findings showed that users 
were not searching CD-ROM databases efficiently. 

Whitaker's (1990) study of user behaviour found end- 

user searching ineffective with inappropriate selection 

of database as a common problem. Johnson and Rosen 

(1990) stressed the importance of end-user instruc- 

tion. Users tended to believe they had retrieved all 

relevant documents when they searched a computer 
database. If they didn't find anything users assumed 
that the information didn't exist, rather than think that 

an error had been made (Johnson & Rosen, 1990). 

Day (1994) outlined the main problems about 
end-user searching: 

e because the user is working alone, there is no 
longer the 'safeguard of a professional intermedi- 
ary' to offer advice about searching (Day, 
1994:138) 

e no user interface is that good that some kind of 
training is not needed 

e end-user interfaces make it appear that searching a 
complex database is easy, yet doesn't help the 
user in selecting appropriate sources or construct 
sophisticated search strategies 

e because information is so nicely packaged on CD- 
ROMs, it is too easy for users to forget other 
complementary sources. Users are often unaware 
of the context in which they are searching. 
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Ideally, a survey of all 167 PLAs would have 
given the most extensive and up-to-date informa- 
tion but resources did not allow this. 


Public library authorities offering end-user access 
Nine PLAs were identified, from Batt's 1992 survey, 
as offering an open access service and questionnaires 
were sent in January 1994 to these authorities, to be 
returned by February 1994. A further 8 PLAs were 
identified from Shields' list and questionnaires were 
sent in February 1994, to be returned by March 1994. 
The 3 PLAs who replied to the LAR Help Line would 
have been surveyed had there been enough time to 
send out the questionnaires and get them back before 
the study's deadline. All told, 17 questionnaires were 
sent, 13 of which were returned in time. (See appen- 
dix I for list of participating authorities.) 


Public library users 
Three PLAs, Croydon, Essex and Sutton, were se- 
lected from Batt's 1992 survey as being suitable sites 
for a user survey. À pilot survey was carried out in 
early January 1994 at Essex. The surveys took place 
during the first part of 1994 in 4 library sites (Croy- 
don's Central Library, Essex's Chelmsford and 
Southend Branches and Sutton's Central Library). 
۸ set of questionnaires were left at the 4 sites, to be 
completed by users and sent back to the researchers 
by March 1994. In all 87 questionnaires were com- 
pleted and analysed. 


Results 

The overall picture 

Types of access 

Batt's 1992 survey revealed that 60% of PLAs had 
CD-ROMs. In 1994 the percentage of PLAs with 
CD-ROMs had only risen marginally to 62%. More 
significantly, the number of PLAs, who were offer- 
ing open access, had increased quite considerably. In 
1992, 5% of PLAs offered a fully public CD-ROM 
service. By 1994 this figure had risen to 12%, a more 
than 100% increase. The overall change was that 
PLAs were moving towards end-user provision. The 
pattern appeared to be of PLAs first obtaining biblio- 
graphic CD-ROMs for use in bibliographic services, 
then increasing the provision of reference titles for 
staff use in dealing with enquiries, moving on to- 
wards intermediary searching for users and finally 
providing full public access. 


Distribution of CD-ROMs in PLAs by region 

The CIPFA estimates for 1993/94 were used to build 
up a picture of the distribution of PLAs with CD- 
ROM by region (Table 1). The regions with the 
largest number of PLAs with CD-ROMs were the 
English Counties, with 31 PLAs and the Metropoli- 
tan Districts, with 25 PLAs. Most of the PLAs with 
open access CD-ROM were in English Counties, 
with 6 PLAs, and in London, with 5 Inner and 5 
Outer PLAs. 
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circulation by December 1993 (Hanson, 1994). 
Chadwyck-Healey (1991) and Day & Whitsed (1991) 
defend CD-ROM by emphasising that most tech- 
nologies are transient to a certain degree and 
CD-ROM has had a vital role to play in giving end- 
users access to and control of information. CD-ROM 
has also been seen as being responsible, in part, for 
'the electronic information revolution' (Cox, 
1994:15). Bevan (1994), while accepting CD-ROM's 
technological shortcomings, believed that no tech- 
nology is without its drawbacks. CD-ROM has 
certain advantages over other alternative means of 
delivering access to information sources: 
e no telecommunication charges 
e costs are known in advance 
e aimed at end-users and save staff time 
e offer a wide variety of information at a various 
prices. Some of the latest multimedia reference 
title are cheaper than print and offer exciting ways 
for users to gain access to information. 

It is these very features which make CD-ROMs 
so attractive to public libraries and ensure that they 
will feature strongly in strategic planning in the 
public library environment. The alternatives, such as 
networking and using JANET, are not feasible op- 
tions for public libraries at the moment (Batt, 1994). 
In the long-term networking could be a viable means 
of accessing information for public libraries. For the 
present and near future, CD-ROM has a significant 
role to play in the provision of electronic informa- 
tion in public libraries which is accessible to users. 


Methodology 
There were three parts to the project and methods 
were tailored accordingly. 


Statistical distribution of PLAs with CD-ROM 

services 

There have been no substantive studies of open 

access CD-ROMs in UK public libraries. To build an 

up-to-date picture of the current distribution of CD- 

ROMs in PLAs, a combination of sources were used: 

. Batt's surveys are wide and broad and excellent 
sources of data but do not deal with the use of CD- 
ROM. Unfortunately the 1992 survey was not 
very current and the next survey wasn't due to be 
published until late 1994, 

2. Shields' Union list of CD-ROMs in London librar- 
ies is more current, dealing with libraries in 1993, 
but is limited in its geographical coverage. Though 
it covered London libraries, some Outer London 
PLAs were not included. 

. a letter was placed in the Help Line section of the 
Library Association Record 96 (3) March 1994 
p142, requesting up-to-date information about open 
access CD-ROMs in PLAs. Three PLAs, Gwent, 
Isle of Wight and Wolverhampton, responded to 
the letter. 

4. the Public Library Statistics 1993-94 Estimates 

(CIPFA, 1993) which provided other data on PLAs. 
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Table 1. Distribution of CD-ROM service in all PLAs 7 region 


Range of titles provided bv PLAs 

In all, 20 authorities provided between them, 189 
CD-ROM titles (Table 2). The range of titles was 
very varied. Some PLAs specialized in certain sub- 
jects, especially business information. Bromley, 
Croydon and Chelmsford have Business Informa- 
tion Units which offer access to titles, for example: 
Fame, Extel Financial, ICC, Kompass UK, 
McCarthy’s. Other PLAs offered an all round mix- 
ture of titles. The general range of titles available 
comprised newspapers, bibliographic databases, 
encyclopaedias and official publications. 


_ Subject category | No. | % 
Business .._ [19 | 10- 
[Reference __ 15 |95- 
Educational |u| 7 
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Table 2. Range of CD-ROMs available 


The large number of bibliographic titles reflects the 
staff base from which most CD-ROM services have 
grown. Most of the titles were scattered among many 
subjects. However newspapers (2296) are obviously 
the real success story, captucing nearly one-quarter of 
the public library market. There are several factors 
which could account for newspapers' popularity: 
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Although some regions had large numbers of 
authorities with CD-ROMs, such as Metropolitan 
Districts with 25, it did not necessarily follow that 
the CD-ROMS were available for public use. When 
calculating the proportion of PLAs which had open 
access, a different picture emerged, with Inner and 
Outer London providing a higher proportion of their 
CD-ROMs for public use. While London only had 
just 13% of PLAs with CD-ROMs, 5 out of 7 (71%) 
Inner London PLAs and 5 out of 15 (3396) Outer 
London PLAs, provided open access. This contrasted 
with Metropolitan Districts PLAs who only provided 
8% of their CD-ROMs for public use. 

Is there any significance in London PLAs, par- 
ticularly Inner London, having higher proportions of 
open access than PLAs in other regions? Some pos- 
sible explanations for this are that: 

1. the types of users. and user needs in London are 
different 

2. London PLAs had a different philosophy towards 
IT and user access 

3. a high concentration of academic institutions in 
London has had a knock on effect on London 
PLAs 

4. PLAs in London have a different attitude to open 
access because of the influence of progressive 
librarians, like Chris Batt at Croydon. 

Further examination ofthe CIPFA statistics seems 
to show that there is no correlation between expendi- 
ture, size of population or number of service points and 
the number of CD-ROMs available. It was particu- 
larly significantthatthe PLAs with the highest number 
of CD-ROMs were not the highest spending authori- 
ties — you obviously do not need money to innovate. 
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CD-ROM selection methods and user involvement 
Only 42% of PLAs had selection policies and, though 
requested, very few PLAs gave details of their poli- 
cies. Only Essex had a written selection policy. All 
13 PLAs did though have a method for selecting CD- 
ROMs, albeit a very vague one. The main selection 
criterion was whether or not the CD-ROM met 
users' or library's requirements. Not surprisingly 
(but disappointingly), no PLAs went in for anything 
like user involvement regarding selection. One won- 
ders how they matched users' needs? 


CD-ROM evaluation methods and user involvement 
It was pretty much the same story when it came to the 
ongoing evaluation of titles. Half ofthe PLAs did not 
have any evaluation policies. None of the PLAs 
involved users in CD-ROM evaluation. There were, 
however, various methods which were used to evalu- 
ate CD-ROMS, such as: 

e usefulness of disc 

e accessibility, currency, coverage, speed 

e staff comments 

e use by users. 


Impact of the CD-ROM service on staff 

It was very clear from all the responses, that CD- 
ROMs have had a major impact on library staff. 62% 
of PLAs found that the CD-ROM service had affected 
staff work patterns. Staff were required to learn new 
skills in order to deliver the open access service — 6996 
of PLAs trained their staff, usually in-house. Work 
patterns changed, with more emphasis on user educa- 
tion and systems management. More time was spent 
helping users or on user education. Though a fair 
number of PLAs found that CD-ROMs improved 
reference work, staff still had to devote time to main- 
taining the service. All PLAs, except one, reported 
positive staff reactions to the CD-ROM service. 


Impact on other sources and services 

The majority of PLAs found that the CD-ROM 
service had not had an impact on the use of printed 
sources. Only 1 PLA found that the CD-ROM serv- 
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l. they are a full-text, one-stop information resource. 
The nightmares of filing and storage are dis- 
pensed with. 

2. newspapers on CD-ROM offer far superior search- 
ing facilities that hard copy services ever did. 

3. the high universal appeal of newspapers as sources 
of information. 

Given that children are such big users of public 
libraries and given the large numbers of educational 
titles available and their obvious popularity, it is 
surprising that educational CDs are so poorly repre- 
sented — just 796 of all CDs. 


Public libraries providing end-user access 
Reasons for setting up a CD-ROM service 

The earliest CD-ROM service was started in 1989 by 
Sutton PLA. It was not clear, however, whether this 
service was open or closed access. Most CD-ROM 
services came on stream in 1990 and 1992. PLAs 
were asked what the major decisions were behind 
setting up the CD-ROM service. This proved to be an 
interesting question, as each PLA had its own rea- 
sons for setting the service up. The most common 
reasons were to improve access to information. Two 
northern PLAs were involved in the PANDA Project 
(Public Access to a Newspaper Database and Ar- 
chive) which involved publishing 7he Northern Echo 
on CD-ROM (see Chapman, 1994 for more details). 
Southwark had a different reason: to improve lit- 
eracy and information skills. Unusually the CD-ROM 
service was run by the Children's Librarian and its 
range of titles were mainly children's. Most of the 
CD-ROM services were initially funded from the 
bookfund and the IT budget. This reflected both the 
software and hardware needs of the service. 


Main goals, aims and policies 

The majority of PLAs had not set any goals for the 
CD-ROM service and only 6 out of the 13 had identi- 
fied potential user groups, and these were mainly 
business users or students. Very few PLAs seemed to 
have set up a CD-ROM service with open access 
deliberately in mind. Most PLAs did not have any 
written policies either. Only Essex had formalized 
written policies for the service. The lack of a targeted 
user group for the CD-ROM service and the lack of 
very clear goals or policies perhaps indicates the 
‘organic’ nature of the CD-ROM service, where use 
evolved from bibliographic staff, to reference staff, to 
intermediary searching and finally to public use. 


Facilities for the CD-ROM service 

The 13 PLAs varied greatly in what they provided in 
way of workstations and titles (Table 3). All 13 PLAs 
had stand-alone CD-ROM workstations, only Croy- 
don had a network. Two to five stations were the 
norm, though the number of titles on offer varied 
enormously — from 1 to 25. (Many academic and 
special libraries would be hard put to beat this latter 
figure.) 
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Table 4. Future plans for CD-ROM service 


Information was easier and faster to access, espe- 
cially in the case of newspapers. The CD-ROMs 
provided more accurate and up-to-date information 
with a much wider scope, than hard copy equiva- 
lents. Despite a widespread lack of monitoring, all 
PLAs felt that users had benefited from the CD- 
ROM service. Although most PLAs reported benefits 
from the CD-ROM for their staff, such as increased 
access to information — particularly in respect to 
subject searching, improved quality of enquiry serv- 
ice and greater efficiency — this was not specific to 
the provision of an open access service. 


Problems identified 

The CD-ROM service was not trouble free for the 
staff, with nearly two thirds of PLAs experiencing 
problems. Problems cited were: lack of expertise and 
confidence (50% of respondents mentioned this); 
not enough time to learn skills (50%); and having to 
deal with equipment and users (16%). Finally one 
PLA complained that staff had to queue up behind 
end-users! 


Public library users’ experiences of CD-ROM 
These results were based on a survey carried out at 4 
library sites. 


User profile 

Full-time students were the largest group of users 
(56%), followed by those in full time employment 
(25%). Only one user was retired and there were no 
users in the ‘looking after the home’ category. The 
predominant age group was 16-25 years (61%), fol- 
lowed by 26-35 years (16%). There were no users in 
the over 65s category. Overall, slightly more users 
were male (55%). The data suggested that CD-ROMs 
were not seen either as a male or female technology. ` 
The typical CD-ROM user was a young male stu- 
dent, who would be using the public library as a 
branch of the college. 


Levels of experience 


61% of users had used CD-ROMs in the surveyed 
library before and 64% had also used CD-ROMs 
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ice increased print usage, while 3 found a decrease 
in print usage. However 69% of PLAs had made 
changes to their print subscriptions. Only 8 PLAs 
had online services, of which 5 found that online 
sources had been affected. 4 PLAs experienced a 
decrease in online use, while another PLA found 
that, initially, there was an increase in online use 
but this was then followed by a decrease. Despite 
the decrease in use, only one PLA intended to cut 
back on the number of online databases provided. 
Most of the PLAs had not experienced any increase 
in the amount of photocopying, as a result of the 
introduction of CD-ROMs. Only 2 out of 12 PLAs 
found an increase in photocopying, while in the 
case of one there was a decrease. This result dif- 
fered greatly from other libraries’ experiences where 
photocopying increased with the use of CD-ROMs 
(Cannell, 1990). 


User promotion, training and documentation 

85% of PLAs advertised the CD-ROM service to 
library users — mainly in the form of display signs 
or notice boards. None of the PLAs were commit- 
ted to any formal user training, preferring to offer 
casual one-to-one training as requested, It was not 
clear why most PLAs did not want to offer any 
formal training, though the fact that formal training 
would take up much time and resources would have 
loomed large in their thinking. Users were offered 
assistance of some kind by 92% of PLAs, and this 
was usually in the form of personal help with 
searches. The fact that only 3 PLAs provided docu- 
mentation for their users was particularly disturbing 
given the type of user to whom public libraries 
address themselves. It is not surprising then that 
users reported a need for more documentation in 
the user survey. 


Monitoring of the CD-ROM service 

Only 15% of PLAs collected any statistics on CD- 
ROM usage with only 2 PLAs having any definite 
monitoring system. Two thirds of PLAs did not 
monitor users’ opinions of the CD-ROM service, 
with only 2 PLAs collecting user’s responses infor- 
mally. Without such data it would surely be difficult 
to develop the service effectively and many costly 
mistakes are likely to occur — more importantly 
perhaps good opportunities are likely to be lost. 


Future plans for the CD-ROM service 

All PLAs wanted to increase the CD-ROM services, 
mainly by increasing the number of titles or number 
of sites and terminals being offered. Although 8 
PLAs also wanted to attract new users, training and 
assistance were not high on their agendas (Table 4). 


Overall benefits of CD-ROM service 

The majority of PLAs felt that the benefits derived 
from the CD-ROM service were mainly of a general 
nature — improving the overall information service. 
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Number of databases searched 

A large proportion of users (55%) searched only one 
database, while 1696 of users searched 2 databases 
and 1096 searched 3 databases. This pattern of use 
possibly indicates the tendency of CD-ROM users to 
rely too much on one source for all their information 
needs (Reese, 1990). 


Success or otherwise of searches 

Users were asked about their search results, such 
as the number of items found and the usefulness of 
search. This was where one found out whether or 
not CD-ROMs had delivered the goods. Most us- 
ers managed to find something during their search: 
33% found between | and 5 items, 32% found over 
30 items. Only 696 of users found nothing at all. 
Users were generally able to retrieve useful infor- 
mation, 66% of users found the items ‘very useful’ 
or ‘useful’ (Table 6). The results were very en- 
couraging: public library users derive definite 
benefits from an open access service; users are 
able to search CD-ROM databases and retrieve 
useful information despite being relatively inexpe- 
rienced searchers. 
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Little bit useful 


Tabie 6. Usefulness of information retrieved 


Problems 

Over two thirds of users, surprisingly, did not expe- 

rience any problem with using the CD-ROM service, 

reminding us perhaps that this was after all a *user- 

friendly’ tool. A quarter of users did, however, 

experience some kind of problems, such as: 

e technical problems (39%), this usually involved 
the printer 

e finding what they wanted (34%) 

e using database commands (17%). 

The types of problems experienced by users indi- 
cate that staff could play a more significant role in 
developing users' search skills. Providing documen- 
tation, and easy-to-understand explanations for the 
various command languages would also eliminate 
some of the problems. 


Help and assistance 

Even though only a quarter of users experienced 
problems, just under half of users did need to ask for 
help. Half the users needed help with technical prob- 
lems (printers yet again). But much of the help 
needed by users could have been addressed by good 
documentation. Table 7 shows the types of help 
needed. 
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elsewhere, mainly at college or university. Thus 
quite a large proportion of users were new to the 
service — thus making supporting documentation 
even more invaluable. Users’ CD-ROM experience 
was spread out across a wide range of levels with 
40% only having had a little experience; only 13% 
regarded themselves as experienced. 


Locating and accessing the CD-ROM service 

Most users found out about the CD-ROM service 
by word of mouth, from library staff (3096) or 
friends and colleagues (2496). A fair percentage 
(1896) accidentally found out about the service. 
One wonders how many more users there would 
be if the public knew there was a CD-ROM service 
available and assistance was more obvious. Once 
users had found the workstations, they did not have 
to wait long before gaining access to the CD-ROMs; 
only 1796 of users had to wait before using them. 
This differs from the situation in educational li- 
braries, where it appears to be common to have to 
wait. The amount of time spent on the CD-ROMs 
was quite varied, 2896 of the searches lasted 15-30 
minutes, 42% of searches took from between 45 
minutes to over an hour. Very few (1396) of 
searches were under 10 minutes — interestingly 
most online searches would be completed in that 
time span. This pattern of use could indicate a lack 
of experience at searching, with users probably 
needing a fair amount of time to conduct their 
searches. 


Purpose for using CD-ROM 

The main reasons for using the CD-ROMs were for 
study or research (77%). This was consistent with the 
user group profile, i.e. students. CD-ROMs did not 
appeal to the purely curious, as no user searched the 
CD-ROMs without a particular reason! 


Databases searched 

Users were asked which databases they had searched. 
17% of respondents didn’t answer this question, 
obviously unaware of what they were searching! As 
not all the 4 libraries had the same CD-ROM titles, 
usage was measured by type rather than by title 
(Table 5). Newspapers, with 47%, and business ti- 
tles, with 43%, were the most heavily used titles. 
Provision would seem to match use, as newspapers 
and business titles form the bulk of the CD-ROM 
which PLAs provide — but of course on the other 
hand you can only use what is there. 
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Table 5. Type of database searched 
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friendliness is partly to blame here, but library staff 
too lack training skills. Also hardly any monitoring 
of use or evaluation of the product is going on and 
this is likely to have serious consequences in the 
future, for CD-ROM is anything but cheap. 

۸ clear picture emerges, from the user survey, of 
who the main users are and why they use CD-ROM. 
Most users are young students who use the CD- 
ROMS for study purposes. Of those who had used 
CD-ROMs elsewhere, most had done so at an educa- 
tional institution. Mostusers only had little experience 
using CD-ROMs. Most users search newspapers and 
business titles: a not surprising revelation given that 
these two categories account for most of the CDs on 
offer. Most of the problems experienced by users are 
either technical or to do with finding the right infor- 
mation. Though CD-ROMs are thought to be 
user-friendly tools, users still require some form 
guidance and assistance. User satisfaction is quite 
high with 6696 of users finding their search results 
useful or very useful. Users are very positive about - 
CD-ROMs: they like them and want more! It is also 
apparent, both from survey results and the literature, 
that some kind of user training is needed. CD-ROMs 
appear to be a user-friendly tool and are popular with 
users and allow the user to access the information 
him or herself but this does not negate the need for 
staff to provide help. Public libraries have a particu- 
lar problem regarding user training because, unlike 
academic libraries, they dc not have a ‘captive audi- 
ence' for targeting training (Day, 1994:138). 


Conclusion 

While researching the topic it became apparent that 
there was a lack of current up-to-date information on 
CD-ROMs in public libraries. This reflects the gen- 
eral lack of research in public libraries. There is both 
the need to improve library statistics and to look 
much more closely at what goes on in the public 
library. The British Library Research and Develop- 
ment Department seems to acknowledge that there is 
a need for more research into public libraries (Re- 
search Bulletin, 1993). 

Much has been said about CD-ROM's future 
but CD-ROM will be around for a long while yet. 
CD-ROMs are an ideal product for public libraries, 
as costs can be controlled and supervision require- 
ments are low. CD-ROM has played a significant 
role in raising the profile of libraries (Bevan, 1994; 
Cannell, 1990 & Reese, 1990) and can play a 
crucial role in lifting the status of the public library 
as a modern information provider. Furthermore, as 
more and more of the public become information 
trained, at school or at college, more will expect to 
find CD-ROMs in their local libraries. The online 
revolution has largely passed public libraries by 
(though they may get another opportunity to get 
aboard with the Internet). Let us hope that they do 
not miss out on CD-ROM too —and this is a technol- 
ogy far more up their street. 
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Users’ evaluation of the CD-ROM service 

Most users (84%) thought the selection of databases 
was good to fair. There was good news for library 
staff as 95% of users thought quite highly of them. 
Only a tiny percentage of users thought staff were 
very unhelpful. Practically all the users (94%) found 
the CD-ROMs ‘easy’ to ‘very easy to use’. Another 
success for open access CD-ROM! It was surprising 
that so many people found using CD-ROM so easy; 
perhaps people might not like to admit that they 
found a ‘user-friendly’ tool difficult to use. 60% of 
users took up the offer to make suggestions for 
improving the CD-ROM service (Table 8). From the 
table it is clear that users want more choice and 
facilities as well as more documentation and 
information. . 


Suggestions 
More CD-ROM titles 


Improved facilities 
(e.g. more or better computers) 


More information 





Table 8. Suggestions for improvements 


Future use by users 

When asked if they would use the CD-ROM service 
again, almost all users said they would. Many users 
added impromptu compliments in this section. From 
the user's point of view CD-ROMS were very suc- 
cessful. 


Summary of main findings 
Just 1296 of public library authorities provide CD- 
ROMS for their users. However the trend is towards 
greater end-user access. London and the English 
Counties are in the vanguard of end-user provision. 
In the case of London nearly a third of PLAs provide 
end-user access to CD-ROMs. Newspapers are the 
most common type of CD-ROM available in PLAs. 
. For the PLAs offering an open access service 
things have gone well and all seek to expand their 
service, largely in respect to the number of titles on 
offer. But what is being neglected in the rush to 
expand the service is the user. Their training and 
documentary needs are not being sufficiently 
considered. Undoubtedly CD-ROMs alleged user- 
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The user study highlighted the significant role 
schools and colleges have played in promoting the 
use of CD-ROM. So far, public libraries have been 
‘lucky’ in thata lot of their CD-ROM users have used 
CD-ROMs elsewhere, mainly at an educational in- 
stitution. This has enabled them largely to ignore or 
overlook the training and support consequences of 
CD-ROM provision but if they are to reach a wider 
and larger audience, as they all say they want to, then 
much greater user support will have to be provided. 
Long term implications are that as more and more 
people come into contact with CD-ROM at school 
and at college, so public libraries will be expected to 
provide a range of sources on CD-ROM that will 
meet their expectations. Will a cash-strapped service 
with so many fingers in so many pies be able to rise 
to these expectations or will the demand be met by 
growing numbers of multimedia workstations in the 
home? Well, one thing is for sure, with the rapid 
expansion in the CD market we shall know the 
answer within the next year or two. 

Finally, public libraries have an important role to 
play in providing universal access to information. 
Open access CD-ROMS in public libraries would 
seem to be able to enhance that role significantly for 
*CD-ROM provision is arguably one of the more 
important gateways to knowledge that the informa- 
tion age has forged.' (de Saez, 1994:125). One sees 
a historical parallel between providing open access 
CD-ROMs and the revolutionary movement of open 
access shelving at the turn of the century (Kelly, 
1977). One of the implications of giving the public 
access to CD-ROMs is that they might become more 
‘information conscious’. By using electronic 
databases and becoming familiar with the large 
amount of data accessible, they might become aware 
of the true vastness of the world of information. 
Information horizons could be broadened and infor- 
mation searching skills formed. 
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Abstract 


This paper describes ongoing developments in the LRE-2 project SECC (Simplified English Grammar and Style 
Checker/Corrector). After a general description of the project, the approach to building the SECC writing tool is 
discussed. First, lingware issues are dealt with: resources used, technical implications of simplified grammar 
correction as machine translation, testing and evaluation issues. Next, we take a look at software issues, in 
particular the user interfaces. Finally, we discuss some open issues and future developments. 


Resources organization 
In a regular Metal language pair, the following 
lingware components are relevant: 


source lexicon analysis 
analysis grammar | 
transfer lexicon transfer 
transfer grammar | 

target lexicon generation 


generation grammar 


Analysis. For the English-SE language pair, the 
source language remains a regular language. Hence, 
the existing English lexicon of Metal (over 50,000 
base forms) is reused as source lexicon, as well as the 
existing English analysis grammar. Whereas the lexi- 
con can be taken over as such, this does not hold for 
the analysis. The input the Metal analysis grammar 
handles is in principle only correct regular English, 
but robustness has required that some semi-gram- 
matical or even ungrammatical structures be accepted 
by the parser. These include sloppy punctuation or 
uncommon adverb placement. These characteristics 
of the English analysis are important in the context of 
the non-native writer support SECC wants to offer in 
addition to regular SE checking. Non-native writers 
make mistakes that sometimes render a sentence 
ungrammatical, and hence make the analysis or any 
further processing behave unpredictably or go to- 
tally wrong.” Non-native writer support requires that 
the analysis grammar be extended to deal with these 
kinds of mistakes. Some of those fall into the cat- 
egory of the phenomena already handled for 
robustness’ sake, but others definitely do not. We 
will come back to the technical details of the ap- 
proach planned in SECC in the section on open 
issues and future work. 


Transfer. As with any new language pair in Metal, 
it is the transfer part that must be developed. The 
transfer lexicon takes care of lexical mappings be- 
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General description of SECC 

SECC is an LRE-2 project that started in November 
93 and runs until May 96 (2.5 years). Partners in the 
project are Siemens Nixdorf (B), Cap Gemini (F), 
Alcatel Bell (B), the University of Leuven (B), and 
Sietec (D); its total anticipated effort is about 13 
person-years. SECC's main goal is the development 
of a tool for technical writers who produce docu- 
ments in a variant of Simplified English (SE) 
described below. The tool will check if the docu- 
ments comply with the syntactic and lexical rules; if 
not, error messages are given, and automatic correc- 
tion (‘translation’) is attempted wherever possible to 
reduce the amount of human correction needed. 
Special attention will also be paid to non-native 
writers of English (French, Flemish, and German 
writers): it is useless to check SE rules if rules of 
English in general are violated in the first place. This 
will add an extra level of complexity to the tool, as 
will be discussed below. Further points of attention 
for SECC include syntactic checking and correcting 
at textual levels beyond the sentence (from para- 
graph to full text), and full integration within a DTP 
environment (Interleaf6). In all these developments 
SGML and its associated tools play a major role. 


Grammar correction as machine translation 

One of the basic ideas of the SECC project is to treat 
SE grammar checking and correcting as a problem of 
translating English into SE. Given an MT system and 
its development environment, the belief is that the 
machinery offered should be sufficient to create a 
special language pair English-SE. At the same time, a 
development like SECC is meant to show that NLP 
components and environments designed for MT in the 
first place can be (re)used for other applications. For 
SECC, we are using the transfer-based Metal® MT 
system and its development environment. As will be 
explained below, the analysis-transfer-generation cy- 
cle is mapped to an analysis-diagnosis-correction cycle. 
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diagnosis plus correction). The grammar is organ- 
ized into four major categories; we give an example 
rule for each category: 


Textual control 
(1) Do not use articles in titles or headings consist- 
ing of a noun or noun cluster. 


Syntactic control 

(2) Do not express an idea in parentheses or between 
dashes inside a sentence. Use a separate sentence 
instead. 


Lexical control 

(3) Use short regular action verbs. Avoid common- 
place verbs such as do, make, get, be, perform 
combined with action nouas. 


Character and punctuation control 
(4) Do not use an apostrophe in expressions such as 
in the 1970s, during the 80s, into the 90s. 


Most of the rules are formulated with "Use (only) 
X’, ‘Do not use Y' or ‘Avoid Z'. ‘Do not use Y’ 
implies: Y is always an erzor; ‘Avoid Z’ implies: if © 
you cannot but use Z, then it is acceptable. With an 
eye to implementation of these rules in SECC's 
transfer and generation grammar (for their diagnosis 
and correction aspects respectively), it is important 
to add that ‘Do not use’ or "Avoid' rules are as much 
as possible complemented with a ‘Use (instead)' 
part. If this is not the case, the reason is either that it 
is obvious from the wording (as in rules (1) and (4) 
above), or that the correction complement of the 


. phenomenon is too complex. All rules are further 


accompanied by one or mcre examples (i.e. pairs of 
wrong and correct sentences) taken from the Alcatel 
Bell SECC test corpus. In case the correction com- 
plement is too complex, the examples suggest typical 
cases and how they can be remedied. Besides being 
a pedagogical requirement, the ‘Use (instead)' part is 
a computational requirement for doing automatic 
correction. 


Generation. Once the trarsfer phase has annotated 
the analysis tree with all diagnosis information about 
detected errors, this information can be turned into 
system output in a generation phase. For an SE 
checker without correction, this output consists of 
error messages (critiques). In the MT context, gen- 
eration normally has a stricter meaning: the input 
source string is replaced by a target string. In SECC, 
both types of generation zre done. If correction is 
possible, part of the SECC output will be an SE 
equivalent of the erroneous regular English input 
sentence. (It is only when such source to target string 
mappings take place, that we speak about correc- 
tion.) Moreover, both in cases where correction is 
possible and in cases where it is not, SECC will give 
error messages about the input. A distinction we 
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tween source and target, and the transfer grammar 
handles structural mappings. Given the particular 
nature ofthe target language, we will first take a look 
at the definition of SE? as used in the SECC context. 
SECC's SE represents a subset of regular English, 
consisting of Alcatel Bell's COLEX (a restricted 
regular English vocabulary), COTECH (a restricted 
technical English vocabulary from the domain of 
telephony) and COGRAM (a restricted grammar). 
COLEX contains about 1,500 accepted regular 
English words. It was built both from existing SE 
lexicons (to the extent that they were accessible) and 
from a word frequency list at Alcatel Bell compiled 
from a large body of in-house telecommunication 
texts. COLEX also defines some 500 'translations' of 
regular English words into one of the 1,500 accepted 
SE words. These translations are implemented in the 
transfer lexicon of the language pair at hand. ۸ few 
examples (adjective, noun and verb respectively) are: 


rapid —> fast 
quick —> fast 
prompt —> fast 
swift —> fast 


category —^ type 
nature شب‎ type 
variety —> type 
class <ج‎ type 
kind —> type 
model | —^ type 
Sort —> type 


alter —> change 
modify —> change 
convert  — change (as in convert money) 
transform —> change 

The examples are a simplification of the status of 
the respective words in COLEX, in a sense that the 
precise contexts of application are left out. Also, 
non-SE words may have different translations, de- 
pending.on these contexts. Prompt, for instance, also 
has a translation into immediate, category also goes 
t group, and convert to adapt (as in convert a 
building). We will come back to this in the technical 
details below. COLEX is not definitive, given that 
changes are possible at the request of users or as a 
consequence of computational problems in the proc- 
ess of building the tool (lack of precision in the paper 
version, inconsistencies, circularities, etc.). 

COTECH is still in the development stage, and is 
currently a collection of different technical terminol- 
ogy bases at Alcatel Bell. One important subset of 
particular interest to SECC is the technical terminol- 
ogy from a 2,000-sentence corpus that SECC uses as 
test material (see the subsection on evaluation below). 

COGRAM isa 150-rule SE grammar, again based 
on available SE rule material and (more importantly) 
on problems found in the test corpus. About two- 
thirds of these rules are computationally tractable in 
the SECC context (i.e. either to do diagnosis or to do 
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in German-English. Hence, the existing English 
entries contain all information needed for analysis 
and generation. Creating a monolingual SE lexicon 
would lead to duplication of information (present in 
both the English and the SE lexicons) and to diffi- 
culties in maintainability. For certain types of 
information (e.g. morphology), a change in one of 
the two lexicons would entail the necessity to change 
it in the other. À second approach could be to mark 
all SE words as such in the existing English lexicon 
(and have only one lexicon loaded). This approach 
also creates problems, both conceptually and 
computationally. First, one must know that Metal 
does not have semantic reading distinctions in its 
monolingual dictionaries; semantic distinctions are 
only made in transfer. So, one can hardly mark a 
word as SE, if only one of its meanings is actually 
SE. Second, the English lexicon serves in many 
language pairs. If for one such pair, a word needs to 
be added, changed or removed, the effects for SECC 
are unpredictable (and vice versa). Moreover, mark- 
ing words for a particular application reduces overall 
reusability and portability. Then whatdo we do? As 
the treatment of reading distinctions already sug- 
gests, the transfer dictionary is the place to be. The 
solution opted for is to leave the English lexicon 
untouched (using it both for analysis and genera- 
tion), and move the information about the SE nature 
of words into the English-SE transfer lexicon. The 
rule is simple: if a word X is SE, then there must be 
a reflexive transfer entry X—» X. For the examples 
given in the previous subsection, the transfer lexi- 
con must contain 


fast رسب‎ fast 

type —^ type 
change  —> change 
immediate —» immediate 
group —> group 
adapt — adapt. 


Depending on whether a word is never SE, or is 
only SE in one of its meanings, the following entry 
configurations occur in the transfer lexicon. We use 
a convention that symbols for non-SE words are in 
lower case, those for SE words are in upper case. We 
are also only concerned with the behaviour of a word 
within one wordclass. 


1. A word x is never SE; it has Y as its SE equivalent. 
x —> Y e.g. altitude —> height 
Y Y height —> height 

There may be other entries x —> Y (if some other 
word also translates into height), but not the entry x 
—> x (altitude —> altitude). 


2. À wordx is SE in one meaning, but not in another. 
(A pair x/X represents the same word, with x its non- 
SE meaning(s), X its SE meaning.) 

X —^» Y e.g. since~—> becauseif subclause ofreason 
X—X since —> since if subclause of time 
Yoo y because —> because 
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make in the context of correcting or not is that 
between a possible and a certain error (correspond- 
ing toweak andstrong diagnosis respectively). Weak 
diagnosis (possible error) corresponds to cases where 
a problematic construction can be identified, but not 
diagnosed to be wrong. Two examples are: avoid 
splitting infinitives; unless the emphasis is on the 
adverb; do not use ‘when’ in conditional clauses? 
For the first rule, a split infinitive can easily be 
detected in Metal, but emphasis cannot; for the sec- 
ond, when can be detected, but we have no foolproof 
way to distinguish temporal and conditional mean- 
ings. Strong diagnosis (certain error) corresponds to 
cases where the system can safely assume there is an 
error (see the example rule below). In case of weak 
diagnosis, we do not correct the sentence automati- 
cally. Whether all cases of strong diagnosis can (should) 
be corrected automatically is an open issue, given that 
we are still in the process of implementing correction. 
In any case, if correction is not done, SECC will still 
make useful suggestions about how to change or 
rephrase the sentence. How SECC produces its 
different output types is discussed in the next section. 

To come back to the resources needed for genera- 
tion, we have to say something about the generation 
grammar and lexicon. 

About the generation grammar, we can be brief: it 
uses the available Metal generation machinery (fea- 
ture percolation, tree transformation, string 
generation), and reuses (small) parts of the existing 
English generation component (as used in German- 
English or French-English). Given that the target is a 
subset of the source, this generation component is 
not as extensive as in a regular language pair. For one 
thing, not all input sentences contain errors; if so, 
they remain unchanged in the output. For another, a 
fair amount of errors can only be diagnosed, but not 
corrected; here too, no changes are made. And fi- 
nally, even if corrections are made, they are often 
. local in nature, so that parts of the input can simply 
be taken over in the output. Still, a non-trivial issue in 
generation is that of multiple generation in case 
SECC suggests different alternatives for correction; 
we will discuss this issue in the section on the techni- 
cal approach. An example correction at the noun 
phrase level is the following: 


Given the rule 


Use ‘of’ in the genitive case when a possessive 
noun form is inanimate, 


a noun phrase like The module's most important 
function is corrected to The most important func- 
tion of the module. 


The target lexicon deserves a little more atten- 
tion, because it is treated in a special way in SECC. 
In principle, we could create a monolingual diction- 
ary that contains all SE-accepted entries. This 
approach is not taken. ۸ regular lexicon in Metal is 
used both for analysis and for generation: the same 
English lexicon serves both in English-German and 
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string) from the final tree is nearly impossible. In 
short, the classical analysis-transfer-generation cy- 
cle.has required some adaptations for SECC. 


Analysis. As to the analysis phase, the problem of 
the disappearing input sentence already starts here. 
Let us take the example sentence The module's ma- 
jor function is to localise the fault. At the end of the 
English analysis, the string that forms the leaves of 
the analysis tree is 7he module's major function is 
function to localise the fault. In other words, the 
analysis creates a kind of deep structure, with the 
implicit subject of the infinitive clause made ex- 
plicit. In this case, material is added that did not 
occur in the input string. In other cases, material may 
be moved (adverbs, for instance), or even deleted 
(inflections and auxiliaries, for instance, are 
featurized). The reasons for these manipulations can 
be found in the particular MT application: already 
during analysis, the input string (or rather, the tree 
dominating it) is ‘prepared’ for translation into a 
target language. To overcome this problem, SECC 
adds a post-analysis phase which adapts the analysis. 
tree so that its leaves correspond to the original input 
sentence again, while leaving all vital analysis infor- 
mation intact. To this end, information is retrieved 
from Metal's chart parser structures (which contain a 
complete record of all manipulations of the input). 


Transfer. Once this analysis tree is reconstructed, it 
can enter the SECC transfer phase. Here, the lingware 
routines corresponding to the COGRAM rules in- 
spect the tree for feature information or particular 
clause patterns, annotating it with error labels when- 
ever necessary. At the same time, the transfer lexicon 
(COLEX/COTECH) is consulted to retrieve SE 
equivalents of the non-SE words, together with the 
information as to how they must appear in the cor- 
rected output. All this transfer information is stored 
in the original analysis tree without changing its. 
structure or replacing its leaves. Here again, it is the 
necessity to annotate an untouched input string that 
requires transfer to be structure and string preserv- 
ing. All structural and lexical changes to the output 
are postponed until generation. We stress this aspect, 
because it is not the way Metal works for regular 
language pairs. There, transfer and generation can 
work in an interleaved fashion, with lexical substitu- 
tions and tree transformations gradually overwriting 
the analysis tree and the input string (because no 
longer needed in the end). 


Generation. As to generation, we already mentioned 
that SECC should generate two kinds of information: 
an output string (when correction can be done), as 
well as useful diagnosis information plus sugges- 
tions for correction or improvement. Still, we want to 
respect the Metal generation mechanism, simply cre- 
ating an output string for each input sentence, without 
side-effecting messages along the way. One special 
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Another example for nouns is the set of mappings 
around error (we leave out the contextual require- 
ments for the different translations): 


error > defect | 
error — — fault | error is never SE 
error —> mistake | 


defect — defect 
fault  —> fault 
mistake —> mistake | mistake is sometimes SE 
mistake —> defect | 
The construction of this kind of lexicon is not an 
easy task. For one thing, the amount of semantic 
information available in the Metal system is not 
always sufficient to make the subtle reading distinc- 
tions intended. This will have implications on the 
kind of lexical error correction SECC can offer. As 
with the regular language pairs, in case the system 
cannot decide which meaning is intended, it offers 
all possible translations. We will come back to this in 
the next section. As a final note concerning the 
lexicon, let us add that the Metal system provides a 
wide range of tools to consult and maintain its lexi- 
cons. For example, we regularly run consistency 
checks to see if 
- allsource entries in the transfer lexicon exist in the 
English lexicon 

- alltarget entries in the transfer lexicon exist in the 
SE lexicon, i.e. are SE-accepted (by checking if 
there is a reflexive transfer for them in the transfer 
lexicon) 

- no circularities occur in the transfer lexicon 
(i.e. there must not be both an entry x —^ Y and 
y — X). 

We also hope to be able to refine the lexical 
relationships between regular English and SE by 
extracting the ‘entry webs’ that are created as coding 
continues (see the error case above). 


Technical approach 

Given the different resources sketched in the previ- 
ous section, we will now have a closer look at a few 
interesting implementation details. 

The peculiar nature of the English-SE language 
pair is a challenge for Metal in many ways. A re- 
quirement that has influenced the organization of the 
analysis, transfer and generation phases is the com- 
plex nature of the SECC output. In the previous 
section, we mentioned that SECC output diagnosis 
information about the original input sentence (with 
suggestions for correction or improvement) as well 
as corrected output sentences (whenever possible); it 
combines the output of a grammar checker and an 
MT system. Given the MT context of SECC, the only 
output the system gives in principle is the target 
string. Moreover, Metal works in such a way that the 
input string is gradually overwritten with the output 
string as a result of the manipulations of the tree 
constructed on top of this string. Recovering the 
input string (let alone a diagnosis-annotated input 
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generation runs, a first one for the diagnosis informa- 
tion, and a second one for the corrected output 
sentence. The first run recursively descends the tree, 
and splits up the error identifiers put on the nodes 
during transfer into an opening and a closing tag. 
These tags are wrapped around the node, and treated 
as *words' of the target language, just like the actual 
words of the input sentence. The first run collects all 
these *words', and puts the resulting string on the 
root node (S) in a feature-value pair. The second run 
is a regular Metal generation run finally transform- 
ing the transfer tree and the input string into a corrected 
SE string. At the end of this double generation cycle, 
both strings are concatenated, giving the final trans- 
lation output. 

To conclude this discussion about the technical 
details of the analysis-transfer-generation cycle of 
SECC, a word is in order about possible alternatives 
in the corrected output sentence. This will also ex- 
plain the appearance of an ALT (alternatives) attribute 
in the corrected output string (see the example above). 
The transfer lexicon examples given in the previous 
section bave shown that sometimes alternative SE 
translations exist for a word. Given the SE principle 
‘one word, one meaning’, this should only be the 
case if these alternatives represent distinct meanings 
of their English counterpart. The simple feature se- 
mantics in Metal does not always permit encoding 
these distinctions computationally; Metal's transla- 
tion process is also non-interactive, so the user is not 
asked for help during the translation. The way this 
problem is handled is by simply offering the alterna- 
tives in the output string. One alternative (the most 
frequently used one in the domain at hand, for in- 
stance) goes through the process of full sentence 
generation, and appears correctly integrated in the 
sentence (correct inflection, correct surface form of 
surrounding elements (a/an, for instance), correct 
distribution over the sentence for multiword entries, 
etc.); the others are attached to it as possible alterna- 
tives. Hence the ALT attribute in the corrected output 
sentence. (Whether one of the alternatives is chosen 
instead of the proposed one depends on the end user 
of the system.) Whereas in regular Metal language 
pairs these alternatives are base forms, in SECC the 
words undergo morphological generation, which re- 
duces the amount of post editing needed when an 
alternative is chosen. Because the attachment of this 
kind of alternative creates a grey area between real 
correction (in the strict MT sense defined above) on 
the one hand and mere suggestions on the other, we 
have considered multiple generation. This would 
mean that for each alternative (or for each possible 
combination of alternatives — a sentence can contain 
more than one choice point), the full generation 
cycle is gone through. The CORR output part would 
then be a set of fully generated sentences, with all 
alternatives worked out. This issue is currently under 
investigation, given the non-trivial changes it re- 
quires in the generation algorithm and the potential 
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reason for going through the regular string generation 
is to keep taking advantage of the layout-preserving 
features of Metal, not only on the text level but also 
on the sentence level (preservation of change in 
character styles — font size, bold/italic, etc.). In prin- 
ciple, a user can then run SECC, and simply get an 
output text with all corrected sentences ‘inserted’ 
into his original input text without loss of layout 
information. How can these requirements be ful- 
filled? In the first place, we designed an SGML 
output representation. It is a complex string object, 
annotated with SGML tags. Simplifying matters a 
little, it has two major elements, corresponding to the 
two types of generation SECC must perform, a 
CORR(ection) and a DIAG(nosis) element. The 
CORR element contains the corrected input sen- 
tence, or the original input sentence itself if nothing 
was found wrong or if correction was not possible. 
The DIAG element is more complex: it repeats the 
input sentence, with non-crossing error tags gener- 
ated around the linguistic entity they apply to. An 
example can make this more concrete: 

The module's major function is to localise the 
fault 
«CORR» 

The «ERR-W64 ALT=(‘primary’)>most impor- 
tant</ERR-W64> function of the moduleis to locate 
the fault. 
</CORR> 
<DIAG> 
<ERR-S30> 
<ERR-WG78> 
the module’s 
<ERR-W64 COR=(‘important’ ‘primary’)> 
major 
</ERR-W64> 
function 
</ERR-WG78> 
is to 
<ERR-W64 COR=(‘locate’)> 
localise 
</ERR-W64> 
the fault 
</ERR-S30> 
</DIAG> 
ERR-S30: End a sentence with a full stop. 
ERR-WG78: Use of in the genitive case when a 
possessive noun form is inanimate. 
ERR-W64: Use only SE-accepted words. 

(Note the correction ofmajor to the superlative of 
important, as well as the preservation of the boldface 
in the corrected noun phrase.) 

This object contains everything needed to present 
any part of its information (in particular to a user, see 
below) in whatever way one chooses, given SGML 
support tools. Storing the object in a structured data- 
base (for instance, to use it for evaluation purposes) 
is also straightforward. 

How does Metal generate such a complex string 
instead of a ‘simple’ target sentence? There are two 
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Another rate which we think is interesting to 
consider in evaluation of grammar checkers (and 
especially correctors) is what could be called the 
convergence rate. For SECC, it is related to what 
happens when the tool is applied to a text in different 
cycles. Suppose it is run on an input text, producing 
anumber of automatically corrected sentences.’ These 
sentences are then resubmitted to the same version of 
the tool. In principle, this second run (or any subse- 
quent run, for that matter) should leave them 
untouched. Otherwise, there are problems with the 
application of the grammar rules (sloppy correction, 
rules feeding each other, etc.). For the SECC proto- 
type we (again arbitrarily) set the required 
convergence rate at 80%: at most two sentences out 
of ten automatically corrected ones may generate 
error messages. 


SECC user interfaces 

In the preceding sections we have concentrated on 
the sentence-level lingware (grammar and lexicon) 
activity of SECC. An important part of the software’ 
work concerns the creation of the user interfaces (in 
Motif on Unix platforms). Due to space limitations, 
we cannot include screendumps of the main SECC 
windows; we will briefly give a very general descrip- 
tion of the functionality of the different interfaces. In 
this section, the batch interfaces are described; in the 
next section, the futureinteractive interface is touched 
upon. Asa terminological note, the distinction batch- 
interactive refers to the way a request fora SECC run 
is handled. In batch mode, a complete file is submit- 
ted to the SECC processor, queued, processed, and 
file output is sent back at a later stage. In interactive 
mode, a document fragment selected inside a text 
processor is sent directly to the SECC processor, and 
an output result is expected within a few seconds; 
from the user point of view, a good response time is 
crucial. The reason for stressing this technical dis- 
tinction is that interactive can also be used to refer to 
the interaction between the computer and the user. 
Even in a batch system, a lot of this interaction can 
take place. For SECC, this is certainly the case, as 
will be described below: the user can walk through 
the results of the batch process in a way that implies 
a fair amount of system-user interaction. 

۸ first interface (theindependent batch interface) 
allows access to SECC outside of any text editor or 
desk top publishing package. It is an extension to the 
existing Metal interface, and mainly aimed at batch 
processing of input documents originating from dif- 
ferent text processing systems (Word, WordPerfect, 
FrameMaker, Interleaf, Ventura,...). Metal separates 
the text from the layout information, and the text is 
run through SECC in batch mode. Two output files 
are produced: an optional one with just the results of 
correction (taking over the correct input sentences, 
and replacing the erroneous ones with the proposed 
correction), and an error report file with fully SGML- 
tagged sentences (including SECC statistics at the 
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negative effects on system performance. Currently, 
we also do not have a clear idea of the amount of 
variation that will be generated in the corrections. If 
we respect the nature of SE, this amount should be 
kept to a minimum, even if the system is not power- 
ful enough to deal with all semantic and pragmatic 
subtleties. Finally, multiple generation would re- 
quire a different (more complicated?) user 
presentation than the one foreseen now. 


Evaluation issues 
As we already mentioned in the general project 
description, SECC's target domain is that of te- 
lephony, with Alcatel Bell as provider of resources in 
this domain. One of these resources is a 2,000- 
sentence corpus, which we annotated manually for 
all SE problems. It has already formed an invaluable 
source of information to tune COGRAM and illus- 
trate it with real-life examples for the intended users. 
Once the major parts of COGRAM, COLEX and 
COTECH are implemented, the corpus will play a 
central role in system tests. The question then arises 
how the SECC output will be evaluated. In Metal's 
current testing and evaluating environment Sisyphus 
(see Adriaens et al. (2)), regular translation output is 
rated by human evaluators as to correctness or 
understandability. Output quality then receives a 
score based on the amount of correct, wrong and 
understandable translations. For SECC, the com- 
plexity of the output does not allow 'simple' 
judgements. Still, we can say something about the 
evaluation criteria we intend to use. We also refer to 
Wojciket al. (12) for a report on the evaluation of the 
Boeing Simplified English Checker, where compa- 
rable problems and criteria are discussed in detail. 
In the context of information retrieval and error 
treatment in computer programs, the notions of pre- 
cision andrecall (associated with the complementary 
notions of noise and silence, or overkill and under- 
shoot) are used for evaluation. In the context of 
grammar checking, they refer to the following rates: 


Precision = Number of correctly flagged errors 


Total number of errors flagged 


Recall = Number of correctly flagged errors 


Total number of errors actually occurring 


Good precision requires a low rate of spurious 
errors (noise, overkill); good recall requires a low rate 
of missed errors (silence, undershoot). To give an 
example (the only one currently available for grammar 
checkers): Wojcik et al. (12) report a precision rate of 
about 80% (2096 noise), and a recall rate of about 90% 
(10% silence) for the tests they ran with their Boeing 
SEC. For the SECC prototype, we are aiming at a 
_ precision rate of 75% anda recall rate of 80%, although 
these figures are quite arbitrary given the non- 
availability of sufficient comparative material. 
Moreover, all figures are oversimplifications because 
they treat errors on a par; more refined calculations 
should take into account the relative weight of errors. 
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checker either. SECC will try to capture some of the 
most frequent grammatical and lexical mistakes made 
by Flemish!®, French and German writers. From the 
three studies we did on this subject, the following 
sentences contain some typical examples: 


Flemish writer 

Eventually, another telephone may work perfect too. 
(Eventually: Dutch eventueel = English possible; 
should be Possibly/Maybe; perfect: Dutch adverbs 
have the same form as their counterpart adjectives; 
should be perfectly.) 


French writer 

Itis a wellargumented text in whichis discussed the 
problem of system maintenance. (argument < French 
argumenter, English = argue; wrong (French) 
subclause inversion) 


German writer 

Our system is better as the one who is offered by the 
competition. (German besser als, English than; 
German classification of relative pronouns is gen- 
der-based, not sex-based, hence confusion who/that) 


We are fully aware of the additional complexity 
this adds to the system. Up to now, SECC did not 
need to intervene during the analysis phase: the input 
is expected to be regular English. Given non-native 
mistakes, the analysis phase in itself will need a 
recovery or correction component to obtain an analy- 
sis that is usable by the COGRAM rules in transfer. 
The techniques we are investigating here are rule 
condition relaxation (see e.g. Ravin (6), or Alonso 
(3) for Spanish in the Metal context) as well as the 
addition of low-level fallback rules to be applied if 
the regular grammar rules fail. In any case, because 
of potential unpredictable interference effects inside 
the grammar, the non-native check will be a feature 
that can at all times be switched off by the user. 

All the work in progress reported in the preceding 
sections actually relates to phase 1 of the SECC 
project, the construction of a sentence-level batch 
tool. Phase 2 will broaden the scope of both dimen- 
sions, leading to a beyond-sentence interactive tool. 
As to the beyond-sentence dimension, it relates to 
the COGRAM rules that govern the textual entities 
above sentence level (paragraph, subsection, sec- 
tion, the whole text). SECC will also handle the 
implementable subset of those rules, and produce 
appropriate error messages and possibly corrections. 
To build this part of the system, we are leaving the 
realm of the sentence-based Metal system, and enter 
that of text parsers. The exact approach (and its 
integration with the sentence-level SECC) still has to 
be worked out, but it is clear SGML and its associ- 
ated tools will again play an important role. As to the 
interactive dimension, the goal is to be able to run 
SECC online on selected text fragments while 
working inside Interleaf., Issues here are job queue 
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end). If the user chooses to, he can post-edit this file 
by consulting the diagnosis information and modify- 
ing the proposals for correction in a simple editor. 
Eventually, the correction part of this file (untouched 
or post-edited) is restored in its original layout. Ad- 
ditional functionality offered is related to setting 
parameters of SECC (level of expertise, activation of 
non-native support, etc.), choosing on what server to 
execute the SECC run, and manipulating the queue 
of SECC jobs. 

Whereas the Metal system is normally an inde- 
pendent application that does not run inside a text 
processing system, one ofthe requirements for SECC 
is that it does run inside such a system, namely 
Interleaf. This second SECC interface is the inte- 
grated batch interface. Interleaf was chosen because 
it is the company-wide standard of the intended user 
in the SECC consortium (Alcatel Bell), and also 
because it offers good tools for integrating applica- 
tions into it. In the meantime, Interleaf6 also has 
become Motif-based, and it supports SGML. In or- 
der to meet the Interleaf integration requirement, a 
Metal Application Programming Interface was first 
constructed so that Metal functionality is accessible 
to other applications. Using this APF, all the func- 
tionality described in the previous paragraph for the 
independent batch interface is also implemented in- 
side Interleaf. In addition, it offers more bells and 
whistles as to the presentation of error messages, 
pasting corrections into the document, setting SECC 
parameters, etc. Finally, users can suggest changes 
to COLEX or COTECH via a coding interface. Note, 
by the way, that given the nature of SE, lexical 
changes cannot be made freely, but need to go through 
a lexicon administrator. 

As a general characteristic, the SECC interfaces 
are text-driven (like most other SE checkers): error 
messages and corrections are stored with each sen- 
tence object, and are normally accessed via that 
object. Anotber approach one can take to the presen- 
tation of the tool's results is a grammar-driven one. 
Here, the applied rules determine the view one gets 
of the results, and actions are associated with a rule 
or rule class (retrieving all sentences to which the 
rule applies, for instance)? Both views are comple- 
mentary, but in the current planning of SECC we will 
concentrate on the text-driven approach. 


Open issues and future work 

Àn important issue we have not gone into yet (also 
because we cannot report on implementation results) 
is the non-native support SECC wants to offer. As 
mentioned before, in a multilingual context where 
English plays an important role many non-native 
writers produce technical documents in English or 
SE. As earlier tests have shown (see Adriaens 
&Schreurs (1)), the majority of mistakes made are 
due to interference ofthe native language and do not 
concern SE in the first place. However, if nothing is 
done about them, there is not much use for an SE 
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12. WOJCIK, R. H., HARRISON, P. and BREMER, 
J. Using bracketed parses to evaluate a grammar 
checking application, Proceedings of ACL93, 1993. 


pp.38-45. 


! SECC is also: Lieve Macken, Luc Pauwels (both 
University of Leuven), Anne Derain (Cap Gemini), 
Frederik Durant (Siemens Nixdorf), Patrick 
Goyvaerts (Alcatel Bell), and Uus Knops (Sietec) 
whose teamwork has formed the basis of this 
paper. Gert De Braekeleer and Bart Depoortere 
(both ex-Siemens Nixdorf) also helped lay the 
foundations of SECC. 

2 When no overall sentence analysis is found, the 
Metal chart parser returns a so-calledphrasal analy- 
sis, a collection of the largest phrases identified in 
the sentence. These 'chunks' can then still un- 
dergo useful further processing (cf. Critique's 
fitted parse, Ravin(6)). . 

3 We will not go into the different ways one can 
define Simplified English. See on this and other 
SE-related matters Wojcik et al. (11, 12), van der 
Steen and Dijenborgh (8); Adriaens and Schreurs 
(1); Schreurs and Adriaens (7); Humphreys (4), 
Pulman and Rayner (5). 

* In the Metal context, grammar and style checking 
experiments (without correction) of this kind had 
already been done for German before SECC in the 
context of the TWB Esprit project (see Thurmair 
(9, 10)). 

5 The actual COGRAM rule is more elaborate than 
this; matters are simplified here. 

$ For evaluation of SECC, we will closely collabo- 
rate with the TSNLP LRE project (Test Suites for 
Natural Language Products). TSNLP will use 
COGRAM and COLEX as a basis for constructing 
test suites for controlled language checkers like 
SECC. 

7 For a checker not doing correction, the conver- 
gence test could be applied to human corrections 
of a text. For SECC, this could also be done, but 
then a distinction should be made between human 
and computer corrections. 

8 Although we could have accessed Metal function- 
ality directly in the independent batch interface, 
we also used the API to build it, because it requires 
less system-dependent programming. 

? Seee.g. the checklist facility in Oracle's CoAuthor 
(a commercial SE checker), or the more recent SE 
checker developed by GSI-Erli in the context of 
the GRAAL project. 

10 For clarity's sake: both Dutch people (1.e. people 
living in the Netherlands) and Flemish people (i.e. 
people living in the northern part of Belgium) 
have Dutch as their language. The technical writers 
in the consortium are Flemish. 
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management (of batch and interactive requests to the 
same processor), acceptability of response times, 
new user interface matters (result presentation and 
integration). 

In short, we have travelled a bit, but there is still 
a long way to go. 
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Abstract 

This paper describes the research and development activities carried out in the framework of the Translearn 
project. The aim of the project is to build a translation memory tool and the appropriate translation work 
environment. Translearn's application corpus consists of regulations and directives of the European Union (EU), 
extracted from the CELEX database, the EU's documentation system on EU law, and the language versions it 
concentrates on are English, French, Portuguese and Greek. The development of the prototype tool for the 
envisaged system proves the application's usefulness in the translation process of international multilingual 


organizations as well as in the localization-internationalization process of international enterprises. 


The key issues of the approach revolve around 
three major axes: 


e organization of multilingual parallel corpora, i.e. 
texts in different languages, one being the transla- 
tion of the other 


e alignment of parallel texts, i.e. establishment of 
correspondences between units of parallel texts 


e text matching techniques. 


The targeted *end product' is a prototype transla- 
tion memory tool and the appropriate translation 
work environment for machine assisted translation 
in multilingual professional environments like trans- 
lation departments of international organizations and 
enterprises. 

In section II, an overview of the approaches to the 
key issues of Translearn is discussed. In section III, 
text preprocessing and in particular the techniques 
adopted for text alignment are presented together 
with examples of aligned text derived from the appli- 
cation corpus. In section IV the text matching tool is 
discussed, while in section V the overall system 
architecture is sketched. In section VI the application 
of the translation memory tool on the CELEX data- 
base 1s discussed. 


II. Background 

The technology underlying translation memory ap- 
plications stems from what has been described in the 
literature as example-based machine translation 
(EBMT). EBMT is based on the idea of performing 
translation by imitating translation examples of simi- 
lar sentences (Nagao 84). In this type of translation 
system, a large amount of bi/multi-lingual transla- 
tion examples has been stored in a textual database 
and input expressions are rendered in the target 
language by retrieving from the database that exam- 
ple which is most similar to the input. 
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I. Introduction 

This paper describes the research and development 

activities carried out in the framework of the LRE/ 

Translearn project. The project's conception stems 

from the observation that translation work is very 

frequently characterised by two parameters: repeti- 
tion and high demand on quality. This is particularly 
true for translation of technical and administrative 
documentation, becoming more evident in the case 
of law documents (contracts, regulations, etc.) and 
product documentation (manuals, etc.) where repeti- 
tion of blocks of text may reach a rate of 70% and 

sometimes higher. . 

The aim of this project is to tackle this problem 
by providing a computational environment, in more 
practical terms a toolbox that will: 

e rid translators of the repetitive part of their work 
by reusing existing human translations and learn- 
ing from them 

e enhance quality and consistency of translation 
by being able to integrate ancillary translation 
tools. 

Appropriate storage of pairs of source language 
(SL) and target language (TL) blocks of text and 
provision of means for retrieval of applicable solu- 
tions and means for post-editing them would increase 
the productivity of a translator and at the same time 
improve the quality and consistency of the transla- 
tion (Freibott 92) (Ishida 94). 

The project's descriptive goal is to develop a 
machine translation aid tool dedicated to managing 
repetition phenomena in the translation of specific 
types of text. Its methodological goal is to employ 
sophisticated text matching techniques in order to 
identify the longest coherent part of source language 
text that is identical or similar to an input to-be- 
translated-text and retrieve from the memory the 
corresponding target language text. 
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(11) the definition of the metric of similarity between 
two text units. 

As far as the decision about the text unit is 
concerned, the obvious choice is to use as text unit 
the sentence. This is because not only are sentence 
boundaries unambiguous, but also translation pro- 
posals at sentence level are what a translator is usually 
looking for. Sentences can, however, be quite long. 
And the longer they are, the less possible it is that 
they will have a perfect match in the translation ar- 
chive, and the less flexible the EBMT system will be. 

On the other hand, if the text unit is the sub- 
sentence, we face one major problem: that is the 
possibility that the resulting translation of the whole 
sentence will be of low quality, due to boundary 
friction and incorrect chunking. In practice, EBMT 
systems that operate at sub-sentence level involve 
the dynamic derivation of the optimum length of 
segments of the input sentence by analysing the 
available parallel corpora. This requires a procedure 
for determining the best ‘cover’ of an input text by 
segments of sentences contained in the database 
(Nirenburg 93). It is assumed that the translation of 
the segments of the database that cover the input 
sentence is known. What is needed, therefore, is a 
procedure for aligning parallel texts at sub-sentence 
level (Kaji 92, Sadler 90). If sub-sentence alignment 
is available, the approach is fully automated but is 
quite vulnerable to the problem of low quality as 
mentioned above, as well as to ambiguity problems 
when the produced segments are rather small. De- 
spite the fact that almost all running EBMT systems 
employ the sentence as the text unit, it is believed 
that the potential of EBMT lies on theexploitation of 
fragments of text smaller than sentences and the 
combination of such fragments to produce the trans- 
lation of whole sentences (Sato 90). Automatic 
sub-sentential alignment is, however, a problem yet 
to be solved. 

Turning to the definition of the metric of similar- 
ity,the requirement is usually twofold. The similarity 
metric applied to two sentences (by sentence from 
now on we will refer to both sentence and sub- 
sentence fragment) should indicate how similar the 
compared sentences are, and perhaps the parts of the 
two sentences that contributed to the similarity score. 
The latter could be just a useful indication to the 
translator using the EBMT system, or a crucial func- 
tional factor of the system as will be later explained. 

The similarity metrics reported in the literature 
can be characterized depending on the text patterns 
they are applied on. So, the word-based metrics 
compare individual words of the two sentences in 
terms of their morphological paradigms, synonyms, 
hyperonyms, hyponyms, antonyms, pos tags 
(Nirenburg 93) or use a semantic distance d (0«d«1) 
which is determined by the Most Specific Common 
Abstraction (MSCA) obtained from a thesaurus ab- 
straction hierarchy (Sumita 91). Then, a similarity 
metric is devised, which reflects the similarity of two 
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There are three key issues which pertain toexam- 
ple-based translation: 

e establishment of correspondence between units in 
a bi/multi-lingual text at sentence, phrase or word 
level, i.e. alignment of parallel texts 

e a mechanism for retrieving from the database the 
unit that best matches the input 

e exploiting the retrieved translation example to 
produce the actual translation of the input sen- 
tence. 

Several different approaches have been proposed 
tackling the alignment problem at various levels. 
Catizone's technique (Catizone 89) was to link re- 
gions of text according to the regularity of word 
cooccurrences across texts. (Brown 91) described a 
method based on the number of words that sentences 
contain. Moreover, certain anchor points and para- 
graph markers are also considered. The method has 
been applied to the Hansard Corpus and has achieved 
an accuracy between 9696-9795. 

(Gale 91) proposed a method that relies on a 
simple statistical model of character lengths. The 
model is based on the observation that longer sen- 
tences in one language tend to be translated into 
longer sequences in the other language while shorter 
ones tend to be translated into shorter ones. Although 
the apparent efficacy of the Gale-Church algorithm 
is undeniable and validated on different pairs of 
languages (English — German — French — Czech - 
Italian), it seems to be awkward when handling 
complex alignments. Complex alignments are defined 
to be alignments in which the 1-1 correspondence 
between text units in the parallel texts does not hold, 
and they are usually due 10 mergers of sentences 
occurring during the translation process. In the Gale- 
Church algorithm the 2-1 alignments had five times 
the error rate of 1-1. The 2-2 category disclosed a 
33% error rate, while the 1-0 or 0-1 alignments were 
totally missed. 

(Simard 92) argues that a small amount of lin- 
guisticinformation is necessary in order to overcome 
the inherited weaknesses ofthe Gale-Church method. 
He proposed using cognates, which are pairs of 
tokens of different languages which share ‘obvious’ 
phonological or orthographic and semantic proper- 
ties, since these are likely to be used as mutual 
translations. (Papageorgiou 94) proposed a generic 
alignment scheme invoking surface linguistic infor- 
mation coupled with information about possible unit 
delimiters depending on thelevel at which alignment 
is sought. Each unit, sentence, clause or phrase, is 
represented by the sum of its content part of speech 
tags. The results are then fed into a dynamic pro- 
gramming framework that computes the optimum 
alignment of text units. 

In establishing a mechanism for the best match 
retrieval two crucial tasks are identified: 

(i) determining whether the search is for matches at 

sentence or sub-sentence level, that is determin- 

ing the 'text unit', and 
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centred phrases, etc. must be given to the user in 
their original form. 

The texts contained in the Translearn application 
corpus are in simple unstructured ascii format, i.e. 
word-processing and/or typesetting information has 
already been excluded. 

As already discussed briefly above, alignment 
consists in establishing correspondence links be- 
tween units in a bi/multi-lingual text. The heart of the 
alignment scheme, adopted in Translearn, is a method 
for aligning sentences based on a simple statistical 
model of character lengths (Gale 91). The method 
relies on the assumption that longer sentences in the 
source language tend to be translated into longer 
sentences in the target and that shorter sentences in 
the source are translated into shorter sentences in the 
target. A probabilistic score is assigned to each pair 
of proposed sentence pairs, based on the ratio of 
lengths ofthe sentences and the variance of this ratio. 
This probabilistic score is used in a dynamic pro- 
gramming framework in order to find the maximum 
likelihood alignment of sentences. The whole proc- 
ess proceeds in two steps. First, paragraphs are aligned 
and then sentences within a paragraph are aligned. 
Apparently, for the method to work well, the texts 
should have exactly the same number of large re- 
gions, bearing the same structure. In case sentences 
have been added or deleted during the translation of 
source into target, this method is expected to fail. It 
would be desirable for the method to provide ways 
for setting anchors between the two texts and be able 
to align texts above or below the anchors. Exten- 
sions, some of which follow from the Translearn text 
structures, proposed by (Brown 91) have also been 
taken into account. Instead of measuring sentence 
lengths in characters, they are measured by the number 
of words they contain. Additionally, certain points of 
the texts can be anchored (thus dividing the texts into 
smaller sections to be aligned. Besides anchors, para- 
graph markers are also considered. Anchor points 
are specific to the text to be aligned and they usu- 
ally appear in both texts. They are divided into 
major and minor anchors and alignment proceeds in 
two steps, first aligning major anchor points and 
then minor anchor points. In the first step, align- 
ments of major anchors are assigned a cost. ۸ 
dynamic programming algorithm finds the align- 
ment of major anchors in the two texts with the least 
total cost. This first step outputs the texts as chunks 
of text between aligned major anchors. In the sec- 
ond step, chunks of text are retained that contain the 
same number of minor anchors which divide the 
remaining pieces into smaller sections that may ex- 
tend from one to many sentences. Then, the pieces 
lying between minor anchors are aligned at sentence 
level using a hidden Markov model that generates 
aligned pairs with the assumption that a sentence in 
one language can yield zero, one or two sentences in 
the other language. The method has been applied to 
the Canadian Hansard (parallel English-French) 


85 


sentences, by combining the individual contribu- 
tions towards similarity stemming from word 
comparisons. 

The word-based metrics are the most popular, but 
other approaches include syntax-rule driven metrics 
(Sumita 88), character-based metrics (Sato 92) as 
well as some hybrids (Furuse 92) (Cranias 94). The 
character-based metric has been applied to Japanese, 
taking advantage of certain characteristics of Japa- 
nese. The syntax-rule driven metrics try to capture 
similarity of two sentences at the syntax level. This 
seems very promising, since similarity at the syntax 
level, perhaps coupled by lexical similarity in a 
hybrid configuration, would be the best an EBMT 
system could offer as a translation proposal. Thereal 
time feasibility of such a system is, however, ques- 
tionable, since it involves the complex task of 
syntactic analysis. 

The third key issue of EBMT, that is exploiting 
the retrieved translation example, is usually dealt 
with by integrating into the system conventional MT 
techniques (Kaji 92), (Sumita 91). Simple modifica- 
tions of the translation proposal, such as word 
substitution, would also be possible, provided that 
alignment of the translation archive at word level 
were available. 


III. Text preprocessing 
In order to be able to make full use of parallel 
corpora, the corpora have to be rendered in an 
approporiate form. To this end, corpora have to be 
normalized, handled and aligned. Normalization 
consists in extraction of the multilingual corpusbody 
of all those sections or information that is not ex- 
ploitable for text translation purposes. 
Text handling can be seen as a sophisticated 
interface between input text streams and various text 
manipulation modules. At the stage of analysis, the 
text handler has the responsibility of transforming a 
text from the original form in which it is found into a 
form suitable for the manipulation required by the 
application; at the stage of synthesis, it isresponsible 
for the reverse process, i.e. for converting the output 
text from the form used by the application into a form 
equivalent to that of the input text. The main opera- 
tionsusually associated with the text handler include: 
e analysis of the format of the physical appearance 
ofthe input text (as evidenced by the word- process- 
ing and/or typesetting commands, such as bold 
and italic characters, indentation, etc.) and map- 
ping of these into a standardized markup language 
or a canonical form recognized by the application 

e identification of textual units at the level of para- 
graphs and sentences 

e identification of extra-linguistic elements, such as 
dates, abbreviations, acronyms, list enumerators, 
numbers, etc. 

e at the stage of synthesis, conversion of the output 
of the application into the same format recognized 
at the stage of analysis; e.g. italicized characters, 
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corpus was low, a fact attributable to the corpus type. 
To illustrate, in Figure 1, aligned sentences taken 
from the English-French pair of the application cor- 
pus are presented. The format shows sentences in 
alternate languages; each English sentence is aligned 
with the French sentence that follows it. Markers in 
angled brackets (<S>) are used for sentence-end 
annotation. 


corpus, which is structured and in which anchor 
points are easily detected. The approach, however, 
also works where anchors are rare. 

Experiments with the statistical techniques ap- 
plied on Translearn's application corpus showed that 
alignment can achieve a rate higher than 9696. Not 
unexpectedly, the rate of complex alignments (2-1, 
1-2, 2-2, 0-1, 1-0) resulting from the application 


COMMISSION REGULATION (EEC) No 486/89 of 27 February 1989 on the sale by the procedure laid 
down in Regulation (EEC) No 2539/84 of beef held by certain intervention agencies and intended for 
export, amending Regulation (EEC) No 569/88 and repealing Regulation (EEC) No 3627/88 «S» 


RÈGLEMENT (CEE) No 486/89 DE LA COMMISSION du 27 février 1989 relatif * la vente, dans le cadre 
de la procédure définie au réglement (CEE) no 2539/84, de viandes bovines détenues par certains 
organismes d'intervention et destinées à étre exportées, modifiant le réglement (CEE) no 569/88 et 
abrogeant le réglement (CEE) no 3627/88 «S» 

# 

THE COMMISSION OF THE EUROPEAN COMMUNITIES, <S> 


LA COMMISSION DES COMMUNAUTÉS EUROPÉENNES, <S> 


# 
Having regard to the Treaty establishing the European Economic Community, <S> 


vu le traité instituant la Communauté économique européenne, <S> 

# 

Having regard to Council Regulation (EEC) No 805/68 of 27 June 1968 on the common organization of 
the market in beef and veal (1), as last amended by Regulation (EEC) No 4132/88 (2), and in particular 
Article 7 (3) thereof, <S> 


vu le réglement (CEE) no 805/68 du Conseil, du 27 juin 1968, portant organisation commune des 
marchés dans le secteur de la viande bovine (1), modifié en dernier lieu par le réglement (CEE) no 4132/ 
88 (2), et notamment son article 7 paragraphe 3, <S> 

# 


one exist in the targetlanguage. The approach adopted 
to text matching is based on computations of com- 
mon elements between an input sentence and a 
database sentence and computation of consecutive 
elements in them. The level at which computations 
of common elements are performed can vary be- 
tween wordform level and lemma-tag level, i.e. 
computations are either based on wordforms and 
their respective position in the compared sentences 
or on lemma-tagtuples of each word in the compared 
sentences as well as their respective positions in 
them. The level of computations depends on the 
availability of linguistic processors for the language 
pair at hand. In case linguistic processors are avail- 
able, the level of computation is externally 
configurable by the user. 

The matching tool first searches for perfect matches 
between the input and the database sentences. In doing 
50, it does not take into account extra-linguistic tokens 
of the sentences like dates and numbers, so that lin- 
guistically real perfect matches are not missed due to 
minor differences. If no perfect match is found, the 
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Figure 1: CELEX aligned sentences 


Depending on the availability of corpus linguistic 
annotators in the languages represented in the multi- 
lingual corpus, the corpus is lemmatized and tagged 
for grammatical category (part of speech, pos). 
Lemmatization consists in deriving the lemma or 
canonical form of each wordform while tagging con- 
sists in labelling each wordform, with itsgrammatical 
category or part of speech. Ambiguities stemming 
from multiple possible lemma and tag assignments 
are not resolved and all possible values are stored in 
the memory. 


IV. Text matching 

The core of the system is its text matching tool. 
Having rendered the corpus in the appropriate form, 
and aligned it so that the system knows for each 
database sentence in a source language A the corre- 
sponding database sentence in a target language B, 
the matching tool can search for database sentences 
of language ۸ that are identical or only similar to an 
input sentence (in source language A) and retrieve 
the equivalent sentence or sentences, if more than 
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the Translation Memory which is identical to the 
input sentence, and 

(ii)extraction of candidate sentences and the fuzzy 
match process. The fuzzy match process aims at 
extracting from the TM a number of sentences 
and their translations which resemble the given 
input sentence above a certain minimum degree 
(percentage). 


Perfect match mechanism 
The mechanism that looks for perfect matches is a 
module of the TM system. In the case where a perfect 
match is found, the output of this process is a data- 
base sentence which is identical to the input one. 
The input to the perfect match module is the input 
sentence as annotated by the text handler as well as 
meta-information about the database sertences stored 
in the database. The output of the perfect match 
algorithm is a sentence which perfectly matches the 
input sentence, if such a sentence exists. 


Fuzzy match mechanism 

The aim of the second phase of the matching mecha- 
nism is to find a sentence or a set of sentences in TM 
which are as similar as possible to the input sentence. 
This phase of the matching mechanism uses the 
results of the morphologicalanalyser and other meta- 
information stored in the database. The output data 
of this mechanism is a sentence or a set of sentences 
in TM which are as similar as possible to the input 
sentence. For each database sentence a similarity 
score is computed. The module also computes indi- 
cations of the common parts or words between the 
input and the database sentences as the user needs 
this information in order to adapt efficiently the 
suggested translation. 

The second phase of the matching process 1s 
separated into two stages: 

(i) extraction of candidate sentences. 
(11) fuzzy match procedure. 

The aim of the first stage is to extract a list of 
candidate sentences from the database which have 
some common characteristics with the input sen- 
tence. This stage is used in order to reduce the search 
space and to speed up the system. Numbers of words 
and numbers of content words have been alternately 
studied and used in this stage. The underlying as- 
sumption is that two sentences with the same number 
of words (or content words) may be more similar 
than two sentences whose lengths are different. 

The aim of the fuzzy match procedure is to com- 
pute thenumber of common elements and the number 
of consecutive common elements between the input 
and the database sentence. In the simplest case an 
element corresponds to a word. The procedurecan be 
expanded to encapsulate surface linguistic informa- 
tion, if it is available. In such a case, the element is a 
combination of a word and a lemma (and/or a pos 
tag). Furthermore, it computes the similarity score 
between the two sentences compared. 
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matching tool searches for database sentences that are 
similar to the input, i.e. for fuzzy matches. In doing so, 
the tool considers either wordforms only or surface 
linguistic data (lemmas and tags) in order to search for 
similar sentences and identify their common parts. 
The parts of the database and input sentences that are 
different are computed and displayed to the user so 
that he/she knows where to intervene in the proposed 
translation. In addition, the system computes a simi- 
larity score between the compared sentences, based on 
the importance of the differences between them. The 
similarity score is externally configurable in that it can 
` accept a minimum value for the similarity score in 
case of fuzzy matches. The modifications that the user 
may make to the proposed translations are then stored 
in the system for future use, thus enabling the system 
to learn new translations. 

The input data of the matching mechanism 
are classified in different categories and extracted 
from different modules of the TM system. Input data 
exist in: 

e the string of characters of the input sentence 

e the sentences as annotated by the text handler. 
Extra-linguistic tokens, like numbers and dates 
appearing in a sentence, are annotated as such by 
the text handler 

e lemmas and part-of-speech (pos) tags, i.e. gram- 
matical categories, as extracted from the linguistic 
processors 

e the database sentences and their translations. Fur- 
thermore, other pieces of information such as the 
position of the words in the sentences, the number 
of characters and words of a sentence are used in 
order to accelerate the matching process 

e minimum similarity score, a boundary value for the 
similarity score and is given by the user. Matches 
that correspond to values that fall below this thresh- 
old value are rejected by the matching mechanism. 

The output of the matching mechanism is: 

e a database sentence or a set of database sentences 
that have a certain similarity to the input sentence. 

e similarity scores. Each of the database sentences 
which is close to the input sentence is associated 
with a similarity score so that alternative solutions 
are accordingly ranked. The similarity score ex- 
presses the degree of similarity between the input 
and the database sentence. The greater the similar- 
ity score, the more similar the sentences. The 
similarity score is expressed as a percentage value. 
A 10096 match means that there is a sentence in the 
TM which is identical to the input sentence 

e common words and parts of sentences between the 
input and the database sentences. This information 
is provided to the user so that he or she can later 
adapt the suggested translation in an efficient 
manner. 

The matching mechanism consists of two 
processes: 

(1) the perfect match process by which the system 
finds a database sentence (and its translation) in 
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Interactive corpus-based translation drafting tool (Translearn) 


e the TL sentence associated with the SL in the TM 
and, if possible, the matching segments between 
the SL and the TL. 

In cases where fuzzy matches accepted by the user 
are found, the user is asked to render in the target 
language those parts of the SL sentence that have not 
matched. In this way, the user can render the exact 
translation of the input sentence he wants to translate, 
reusing the existing translations for parts of it. The new 
emerging pair of translation units is then stored in the 
translation memory database for future use. In cases 
where no match can be found, including cases where 
matches exist but their score is below the user's desired 
threshold, the user is asked to provide the translation of 
the IS which is again subsequently stored in the TM 
database. Thus, the translation memory system starts 
learning new translation pairs in an interactive mode. 


System architecture 

The above described tools have been implemented in 
the Translearn environment. Figure 3 shows the con- 
figuration of the Translearn tools and their 
communication. 


Exemplary cases of fuzzy matches computed by 
the matching tool include: (Sa, Sb, Sc, Sd stand for 
segments of sentences extending over a number of 
words identified in the input sentence (IS) and data- 
base source language (SL) sentences). | 


IS : Sa Sc 
SL: Sa Sb Sc 


IS : Sa Sb Sc 
SL: Sa Sb 


IS : Sa Sb 
SL: Sa Sb Sc 


IS : Sa Sb Sc 
SL: Sa Sc 


IS : Sa Sb Sc 
SL: Sa Sc Sb 


IS : Sa Sb Sc 
SL: Sa Sb Sd 





Figure 2: Possible fuzzily matching segments 


The input to the fuzzy match module is: 
e the elements of the input sentence 
e the elements of the database (candidate) sentence. 
The output of the fuzzy match module is: 
e a similarity score 
e ifthe similarity score is greater than the threshold 
the user has set, the matching parts of IS and SL 
appropriately marked 
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Text Alignment 


Figure 3: The Translearn Tool Configuration 
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interprocess communication systems, form a com- 
posite system allowing distributed computation, 


. analysis and presentation. Each client runs an appli- 


cation on a workstation but does database access 
from the server. This process is depicted in Figure 4. 


The Translearn tools are integrated in a transla- 
tion environment operating on a client-server 
architecture. In the standard client-server architec- 
ture, one or more clients and one or more servers, 
along with the underlying operating system and 


Translation Mernories 





Server DBMS 


Commun. Software 





\ CS server 


Commun. Software 


Application Software 


linguistic information that text matching demands. 
The server (UNIX based) stores the multilingual 
corpus meta-data (linguistic meta-data, statistical 
and alignment data) and transmits them over the 
network upon a client's request. 


Application 

Translearn has collected and investigated a large 
body of parallel ascii texts, between 5 and 6 million 
words, for each language, English, French, Portuguese 
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\ Client 
Figure 4: Client-server architecture 


The client presents a graphical user interface 
(Microsoft Windows-based). This interface is the 
sole means of garnering user translation requests, as 
well as the means of presenting the results of one or 
more translation alternatives. In the translation envi- 
ronment, the client performs the necessary handling 
of a text that a user has opened in order to translate, 
without any involvement of the server. Furthermore, 
in real mode operation, the client invokes the appro- 
priate linguistic processors, if available, to fill in the 
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face linguistic information in order to establish cor- 
respondences between phrases/clauses across 
multilingual texts. The alignment software is used 
not only for translation data preparation butit consti- 
tutes an integral utility ofthe Translearnenvironment 
so that if the future user has translated textsavailable 
he or she will be able to align them, store them and 
reuse them. 

The corpus has been lemmatized and tagged at 
part-of-speech (pos) level. Lemmatization is per- 
formed by access to a morphological dictionary. The 
tagsets used are compatible with the TEI and NERC 
guidelines, catering at the same time for the peculi- 
arities of each language. Lemmatization and tagging 
return for each word of the text the combination 
«lemma, pos». If multiple such combinations are 
valid for a word, then all possible combinations are 
output. Combinations of more than one «lemma, 
pos» tuples are then grouped together to form a 
morphologically ambiguous class and these ambigu- 
ity classes are treated as tags of their own. Lemma 
and pos tag information is later utilized in the text 
matching process in order to determine identical or 
similar sentences and subsequently rank their simi- 
larity. 

In Figure 5, we illustrate the text matching proc- 
ess operating on French-English. The example 
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J, et notamment son article 1? paragraphe 2 
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t 

and Greek. The corpus has been extracted from the 
CELEX database, the European Union's (EU) docu- 
mentation system on EU law. The characteristics of 
the administrative sublanguage span the whole cor- 
pus, while technical/financial sublanguage is used 
depending on the subject matter of each text. The 
corpus texts are of regulatory type with slight varia- 
tions, while the structure of almost all texts 1s the 
same. The corpus by itself validates the usefulness of 
the project by the high percentage of frequently 
recurring pieces of text that need not be retranslated 
since one can reuse existing human translations. In 
parallel, samples of texts extracted from software 
manuals have been studied revalidating the useful- 
ness of the approach. 

The corpora have also been aligned so that each 
paragraph and sentence in the French, Portuguese 
and Greek version is linked to the corresponding 
paragraph and sentence of the English version. The 
alignment software that was developed, based on 
techniques considering mainly statistical informa- 
tion, computed 96% correct alignments while 
methods for improvements and increasing robust- 
ness are currently being explored. Experiments for 
alignment below the level of sentence have also been 
made, yielding promising results. The new methods 
combine the power of statistical modelling and sur- 
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sentences are taken from the CELEX application 
` corpus. In the upper window, the input sentence to be 
translated is presented, in the middle window the 
database sentence best matching the input and in the 


lower window the translation equivalent of the data- _ 


base sentence (that of the middle window). In the 
upper and middle, windows the differing segments 
are shown in different colours. In addition, segments 
having the same lemma form are indicated by differ- 
ent colours. In this way, the system indicates to the 
translator which segments have to be changed and 
what types of changes have to be made. In the lower 
window the translator can then make the appropriate 
changes, adopt and store the new translation pair in 
- the database, thus enabling the system to ‘learn’ new 
translations. 
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Abstract 


This paper introduces the topic of evaluation of natural language processing systems, and discusses the role of 
test suites in the linguistic evaluation of a system. The work on test suites that is being carried out within the 
framework of the TSNLP project is described in detail and the relevance of the project to the evaluation of machine 


Likewise, if a user wants to know not just how a 
product behaves today, but in its future potential, 
then he or she will be interested in performing a 
diagnostic evaluation. A diagnostic evaluation for 
the developer will, however, differ from that of the 
user in that the user is typically in a black box 
situation with respect to the system (i.e. he or she 
does not have access to its internal workings), while 
a developer will be in a glass box situation, where he 
or she will have access to the system rules. 


The role of test suites in evaluation 
Traditionally there are two main ways of evaluating 
NLP systems, either by the use of test corpora (i.e. 
pieces of text) or by test suites (i.e. lists of specially 
constructed sentences, or sentence sequences or even 
sentence fragments). Traditionally, test suites are the 
preferred option of the system developer, since he or 
she wants to see how this system will perform on a 
range of controlled examples. And traditionally the 
user prefers to test a system against a test corpus that 
has usually been selected to be representative of the 
texts he requires his NLP system to process. This is 
because test suites are useful for diagnostic evalua- 
tion, whereas test corpora are a tool traditionally 
associated with adequacy evaluation. But as the pre- 
vious paragraph should have made clear, test suites 
could equally prove useful to the user if he or sh 
undertakes diagnostic evaluation. 
Test suites and test corpora have different roles to 
play in evaluation and should be seen not as compet- 
ing tools, but rather as complementary. Test suites 
are useful for presenting language phenomena in an 
exhaustive and systematic way. Thus, for example, 
each different type of noun phrase or adjective phrase 
can be listed, starting with the simplest and increas- 
ing in complexity. Furthermore, combinations of 
phenomena can be generated in a controlled fashion. 
For example, coordinated noun phrases can be pro- 
duced on the basis of simple noun phrases. Negative 
data, likewise, can be derived systematically from 
positive data by violating grammatical constraints 
associated with the positive data item. For example, 
violation of determiner-noun agreement in English 
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translation systems considered. © 


Introduction 

Evaluation is a topic that is currently attracting a 
great deal of interest in the natural language process- 
ing community. The science of evaluation is however, 
relatively speaking, in its infancy. Historically, the 
United States have been ahead of Europe with their 
ARPA and DARPA Speech and Natural Language 
program which started in 1984. Current initiatives in 
evaluation in Europe include the work of EAGLES 
(Expert Advisory Group on Language Engineering 
Standards), which was set up in 1993 and whose 
primary goal is to improve evaluation methods as a 
step towards setting up standards for language engi- 
neering products. 

The Commission of the European Communities 
is also, within the context of its Linguistic Research 
and Engineering (LRE) program, currently sponsor- 
ing several projects in the field of evaluation, 
including the project Test Suites for Natural Lan- 
guage Processing (TSNLP) which is the subject of 
this paper. TSNLP shares with other some other LRE 
evaluation projects the aim of producing a collection 
of common test materials. In this case of TSNLP, this 
constitutes a set of reusable test suites for a range of 
applications. Further aims of TSNLP are described 
below. 


Evaluation: some terminology 

Itis customary to define a number of different evalu- 

ation scenarios, depending on the purpose of the 

evaluation. EAGLES, for example, distinguishes the 

following three types of evaluation: 

e diagnostic evaluation, which aims at localizing 
deficiencies; 

e progress evaluation, for a comparison between 
successive stages of development of a system; and 

e adequacy evaluation, to determine whether and to 
what extent a particular system meets some pre- 
specified requirements. 

Developers are chiefly interested in diagnostic 
and progress evaluation, while users are mainly in- 
terested in adequacy evaluation. However, if 
developers aim eventually to market their products 
then adequacy evaluation is an issue for them too. 


Test suites for natural language processing 





Way”, Arnold et al.?, Chapter 9 provides a general 
introduction, and useful discussions on the role of 
test suites in the evaluation of machine translation 
systems can also be found in King and Falkedal’. Of 
course, test suites can be used straightforwardly in 
the evaluation of many machine translation system 
components (e.g. syntactic parsers). However, their 
use in relation to machine translation systems raises 


. a number of interesting issues. 


First, as King and Falkedal’ point out, most exist- 
ing test suites are designed for monolingual 
applications. However, in the case of machine trans- 
lation systems, ‘bilingual’ test suites are required 
that probe the capacity of systems to deal with par- 
ticular translation problems (such as the problem of 
lexical and structural mismatch, e.g. the classic ‘like 
— plaire’ case, where the arguments of the verb are 
reversed in translation: John likes Mary translates as 
Mary plait a John. Such ‘bilingual’ test suites will 
have to be specially constructed, and in general their 
construction requires some rather detailed insight 
into.the nature of translation problems. Of course, 
‘bilingual’ test suites must be distinguished from any 
test suites that are to be used to test purely monolin- 
gual components, where the test items should be 
translationally unproblematic, so that they do not 
introduce irrelevant difficulties. 

Second, as with generation systems, there is what 
one might call the ‘output’ problem. For some appli- 
cations, one is only interested in whether a system 
accepts or rejects a test item. For such applications, 
the evaluation process can be automated and a high 
degree of objectivity (relative to the particular test 
suite) is possible. With a machine translation system 
this 1s not the case: one 1s typically interested not just 
in whether a system accepts an input, but also in the 
correctness of the output it produces. Of course, one 
cannot simply specify what the ‘correct’ translation 
of any particular test item is — there is in general no 
single ‘correct’ translation of any expression. This 
makes the evaluation process rather subjective, and 
difficult to automate. One interesting suggestion here 
(from Henry Thompson’, and currently being investi- 
gated as part of a research project in Edinburgh) is to 
assume that, though a wide range of translations may 
be possible, ‘good’ translations will tend to be more 
similar to each other than bad ones — ‘good’ transla- 
tions will tend to cluster together. As regards test 
suites, a possible application of this idea would be to 
associate a ‘central’ member of this cluster with each 
test item, and compare this to what the system under 
test actually produces. If the degree of difference is 
within the range that one finds among the cluster of 
‘good’ translations, one may assume that the system 
has performed satisfactorily on this item. 

Finally, test suites need to be supplemented by 
corpus methods to test semantic and pragmatic phe- 
nomena. Despite these limitations, test suite based 
evaluation is unquestionably a useful component in 
the evaluation of machine translation systems, both 
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produces ill-formed examples such:as those heavy 
bookor that heavy books. Note that in test suites, the 
vocabulary, as well as the sort of construction being 
tested, can be controlled. This allows the evaluator to 
focus on the way the system deals with the construc- 


~ „tion without the distraction of problems relating to 


lexical coverage. 

Test corpora, on the other hand, lack the 
exhaustivity and systematicity of test suites. Further- 
more, the complexity of many naturally occurring 
phenomena can make it difficult to isolate the exact 
phenomenon or phenomena that one is interested in 
testing. The task is not helped by the fact that most 
corpora lack any sort of annotation. So; what are the 
strengths of the test corpora method? Well, firstly, as 
already mentioned, test corpora represent naturally 
occurring data, so that one can be sure that the 
phenomena one is testing for really do occur. A 
criticism that can be levelled against the test suite 
technique is that some of the phenomena never ever 
occur in real life. Note, however, that it is a non-trivial 
task to ensure that a test corpus is representative of a 
larger corpus. Text processing tools can give some 
idea of frequency of phenomena and lexica, sentence 
length, etc. but the problem is still a hard one. 

We said above that test suites and test corpora are 
complementary techniques. The test suite method is 
particularly useful for testing syntactic phenomena 
(see for example the Hewlett Packard test suite in 
Flickingeret ۵1.2, perhaps the best known test suite to 
date), where the range of phenomena is relatively 
well understood and well documented. Semantic and 
pragmatic phenomena are less accessible to the test 
suite method, since the phenomena are less easy to 
characterize, and are frequently context-dependent. 
This means that many phenomena, such as anaphora 
resolution, need to be tested within a sequence of 
sentences, rather than in isolated sentences. This is 
where test corpora are useful, because they just are a 
sequence of sentences. Some suggestions for what 
should go into a semantic test suite are discussed in 
` Hoard’, 

It is also the case that some applications are less 
well suited to being tested by the test suite method 
than by test corpora. Message understanding sys- 
.tems, for example, need whole sequences of sentences 
as input, so are better suited to the test corpora 
method. Test suites are useful for any system which 
has a large syntactic analysis component. Further- 
more, they are best suited to applications where it is 
possible to specify not just the nature of the input, but 
also the nature of the output. ۸ good example is a 
grammar checker. Generation systems, on the other 
hand, are less well-suited to this method, since it is 
difficult to specify not only what the input to a 
generation system should be, but also what consti- 
tutes an appropriate output. 

There is a long tradition of using test suites to 
evaluate machine translation systems. Recent exam- 
ples include Gamback^, Heid and Hildenbrand’, and 
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hand. The project will take as a starting point 
work by Arnold et al.' on test suite generation 

— a lexical replacement tool. This will be helpful 

in the customization that will be necessary to 
test system performance against a user's own 
corpora. 

TSNLP began by reviewing publicly available 
test suites, to see in which ways test suite design 
could be improved. 

Despite the frequent reference to test suites in the 
NLP literature, surprisingly few test suites are pub- 
licly available. The test suites investigated differed 
greatly with respect to: 

e purpose (diagnostic/adequacy/progress evaluation) 

e intended application (parsers, MT systems, etc.) 

e depth and breadth of coverage . B 

e presentation of data. 

TSNLP is above all interested in producing a test 
suite that is flexible and reusable. The review of 
existing publicly available test suites revealed the 
following characteristics that are important for flex- 
ibility and reusability: 

e systematic annotation scheme: an explicit charac- 
terization of the test data, not merely section 
headings 

e support tools: software tools to assist in the crea- 
tion or use of test suites 

e documentation: documentation is useful on both 

the design and content of the test suite. 

Few of the test suites we examined or which are 
reported on in the literature contain any or all of these 
characteristics. They are however, a key focus of 
TSNLP. Systematicity, as we have seen, is important 
for negative as well as positive data, and for combi- 
nations of phenomena. However, in the case of 
negative data and combinations of phenomena, the 
possibilities are numerous and some method is needed 
for their selection. One selection criterion might be, 
for example, frequency of occurrence. À proper an- 
notation scheme is required, not just in view of the 
database, but to make the data maximally explicit 
and therefore reusable in general. 

The availability of validated test data that are 
fully annotated and accessible, by means of the 
database, is expected to be of benefit to developers 
and users of NLP products, even outside the applica- 
tions for which the data are principally designed 
(i.e. grammar checkers, controlled language check- 
ers and parsers). Test suites as a tool are, as we have 
discussed, of interest to anyone, developer or user, 
are interested in diagnostic evaluation. Test suites, 
as we have seen, are most useful for systems that 
contain a large syntactic component, and this in- 
cludes many MT systems. The multilingual nature 
of the project means that it should be possible to 
extract parallel data across different languages, and, 
potentially locate where there is non-parallelism in 
structure. Other phenomena of importance to MT, 
such as lexical mismatches, however, remain out- 
side the scope of the present project. 
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for developers and end users. It is to be expected that 
` the development of multilingual test suites, as in the 
present project, will be a useful step towards over- 
coming these limitations, and making them more 
useful still. 


The TSNLP project 
The aim of TSNLP is to develop a methodology for 
the design and development of test suites, and to 
produce test suites for a range of NLP applications. 
These test suites will be of medium size (several 
hundred items) for English, French and German. The 
applications are, specifically, parsers, grammar check- 
ers and controlled language checkers, all of which 
contain large syntactic components, and are thus, as 
we have seen, particularly suited to the test suite 
method of evaluation. However, it is expected that 
the results will be usable for other application types. 
The fact that the data are being constructed in three 
languages (English, French and German) means that 
they should be of particular relevance to multilingual 
applications, including machine translation. The re- 
sults of the project, both scientific reports and actual 
test suites, will be in the public domain. 

The project started in December 1993 and has a 
duration of 20 months. The partners involved are 
The University of Essex, UK, who are the coordina- 
tors, plus Aerospatiale, France, Deutsches Forsch- 
 ungszentrum für Künstliche Intelligenz GmbH. 
(DFKD, Saarbruecken, Germany, and Istituto per 
gli Studii Semantici e Cognitivi (ISSCO), Geneva, 
Switzerland. 

This project has the following aims: 

e to define a set of guidelines for the construction of 
test suites for a range of NL products, including 
machine translation systems, concentrating on 
grammar checkers, parsers and controlled lan- 
guage checkers 

e to produce substantial test suite fragments cover- 

ing core syntactic phenomena in three languages 

(English, French and German). The project in- 

cludes a testing phase for each of the three 

applications and revisions to the guidelines are 
foreseen in the light of test results 

to identify and develop a number of tools which 

will facilitate the construction and use of test 

suites, namely: 

— و‎ database in which the test suite will be stored 
which will allow easy access and manipulation 
ofthe data. TSNLP is inspired by the DITO test 
collection (see Nerbonne et a/.*) in its use of a 
database on which to mount and manipulate the 
data. The aim is to make the test data easy to 
access and flexible in the type of configuration 
that can be retrieved 

— an automatic test suite generation tool. Little 
previous work has been done on the automatic 
generation of test suites, but the endeavour seems 
worthwhile, given the labour-intensive and er- 
ror-prone business of constructing test suites by 
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workshop, Report RL-TR-91-362, Berkeley, New 
York: Rome Laboratory. 


7. KING, M. and FALKEDAL, K., 1990, Using test 
suites in the evaluation of machine translation sys- 
tems, in Proceedings of the 13th International 
Conference on Computational Linguistics (COLING), 
Helsinki. 

8. NERBONNE, J., NETTER, K., KADER DIAGNE, 
À., KLEIN, J. and DICKMAN, L., 1992, A diagnos- 
tic tool for German syntax, Report DFKI D-92-03, 
Saarbrücken. Also in: NEAL, J. and WALTER, S. 
(eds.) 1991, Natural language processing systems 
evaluation workshop, Report RL-TR-91-362, 
Berkeley, New York: Rome Laboratory. 


9. THOMPSON, HENRY, S., 1991, Automatic 
evaluation of translation quality: outline of method- 
ology and report on pilot experiment, Proceedings 
of the Evaluators' Forum, Les Rasses, April 21-24 
1991, available from ISSCO, University of Seem 
Switzerland. pp.215-224. 


10. WAY, A., 1991, ۸ practical developer-oriented 
evaluation of two MT systems, Department of Lan- 
guage and Linguistics Working Papers in Language 
Processing, 26, Colchester, UK: Department of Lan- 
guage and Linguistics. 

Footnote: We would like to thank our colleagues in 
TSNLP for fruitful discussions on this topic: Eva 
Dauphin, Dominique Estival, Kirsten Falkedal, Sabine 
Lehmann, Klaus Netter and Sylvie Regnier-Prost. 
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Developments in Systran 
Dorothy Senez 


TRIA 2/164 — Translation Service, Commission of the European Communities, 200 rue de la Loi, 1049 Brussels. 


Abstract : 

Systran, the European Commission's multilingual machine translation system, is a fast service which is available 
to all Commission officials. The computer cannot match the skills of the professional translator, who must 
continue to be responsible for all texts which are legally binding or which are for publication. But machine 
translation can deal, in a matter of minutes, with short-lived documents, designed, say, for information or 
preparatory work, and which are required urgently. It can also give a broad view of a paper in an unfamiliar 
language, so that an official can decide how much, if any, of it needs to go to translators. In this way, much time 
can be saved for a translation service which is already facing a relentless increase in the volume of its work and 
which will have to cope with the new languages of an enlarged European Union. We have set up a post-editing 
service to correct machine texts for users who cannot do this in their own departments. Raw machine translation 
is only one of a number of multilingual services now being made available. The switch to personal computers 
throughout the Commission, and the greater use of increasingly reliable electronic mail, also means that other 
forms of help can be given. First, a bridge has been created between Systran and Celex (the multilingual data 
base containing Community legislation). Secondly, and only in recent months, Eurodicautom (the Commission's. 
multilingual terminology data bank) has been incorporated in the Systran dictionaries. With this link, it will be 
easy to look up technical terms in a given language and have them returned in one or more other languages. A 
survey has shown how officials use Systran and has enabled us to identify their needs. In all these ways, Systran 
is making excellent progress as a means of rapid communication between the many departments of a multilingual 
Commission. Our aim is to enhance the quality of Systran, to broaden its 0820 to the languages of the 
Community and to explain and vigorously promote its use. 


increase in the output of raw machine translation. 
Much of this may be attributed to an intensive pro- 
motion campaign conducted in tandem by the 
Translation Service and Directorate-General XIII 
(Telecommunications, Information Market and Ex- 
ploitation of Research). Informative brochures on 
machine translation were distributed to all members 
of staff; posters have been put up in all Commission 
buildings; MT users can call a help-desk should they 
have any queries or difficulties; and regular visits are 
paid to user departments to answer questions about 
the system and to discuss the possibility of introduc- 
ing specific terminology. 


Systran and the professional translator 

Professional linguists are no doubt wondering about 
the reactions of in-house translators to this exponen- 
tial rise in MT. Well, one can hardly say they have 
been enthusiastic. For, as long as MT was felt to be 
a substitute for human translation, it was unlikely it 
would be welcomed with open arms. Linguists can 
be reassured. The computer can never match the 
skills of the translator. The aim of our promotion 
campaign was not to seek to impose machine transla- 
tion on professionals who knew full well that raw 
machine translation as such was not going to revolu- 
tionize their daily work, norto advertize the translating 
machine as a competitor, but to allow non-linguist 
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Use and users of raw machine translation 
Increased use of MT 
I intend to discuss, mainly, new progress in Systran, 
much of which has been stimulated by the recom- 
mendations of the 1991 Oakley Report. This was an 
evaluation of the Commission's Multilingual Action 
Plan. But first, let me comment on the growth of raw 
machine translation in the Commission. In 1988 only 
4,000 pages of MT were processed. By 1993, this 
figure had risen to 120,000 pages. The widespread 
use of machine translation in the Commission is 
therefore a fairly recent phenomenon. Previously the 
technical conditions for its use were not met. Docu- 
ments were not prepared in a form that the machine 
could read. Direct access to the system was not 
possible. There was no clear determination by man- 
agement to promote MT. Not enough effort went into 
marketing it. On the technical side, these past five 
years have seen important developments in the Com- 
mission's use of computers. Input for MT is text that 
can be read by a machine. The main difficulty in the 
past was the cost of getting text to the computer. 
Now, with electronic files, it is fairly easy to send 
documents from one department to another by elec- 
tronic mail. Scanning equipment is also more readily 
available. 

Between the first six months of 1993 and the 
corresponding period of 1994, there was a 4796 
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It is with the express aim of providing a better 
backup infrastructure for MT users that a post-edit- 
ing service is being established. The recent survey on 
the use of MT at the Commission showed that there 
are a very large number of MT users who correct 
their texts themselves or who have them corrected by 
colleagues who are fluent in the target language. 
Post-editing has been set up to offer additional help 
to these users, and to those who do not have the 
ability to post-edit within their own departments. 
These people are now able to call on a rapid post- 
editing service which relies on a network of freelance 
translators who have expressed an interest in this 
type of work. Not only are MT users relieved of the 


'time-consuming task of correcting their raw transla- 


tions, but tighter linguistic checks can be kept on the 
treatment of urgent texts. The quality offered by this 
service is at a level which is acceptable for purposes 
other than publication (working documents, internal 
notes, etc.) There are numerous examples of users 
who are quite happy to accept a document in less 
than pristine prose, provided they can have it when 
they really need it. 

This preselection of the right type of text is of the 
first importance. Requests for the post-editing serv- 
ice are examined carefully to ensure that they are in 
fact suitable for this kind of treatment. In our insti- 
tution these will mainly be ephemeral documents of 
a routine, administrative nature, such as technical 
fiches, preparatory working documents, minutes of 
meetings, studies, and so forth. Users of Systran 
have expressed a keen interest in this service, pro- 
vided we can respect very tight deadlines. Since 
speed is of the essence, the documents are transmit- 
ted to and from requesting departments and freelance 
post-editors entirely by electronic mail. Requesters 
make their own assessment of translations. Users of 
the scheme are well satisfied and deadlines are 
being met. The language pairs most used, in view of 
their higher quality, are English-French and French- 
English. 

So far the service has been kept within very 
modest proportions, the ‘grapevine’ being our main 
source of custom. At the moment, with no further 
promotion it is estimated that production for 1994 
should be in the region of 3,000 pages. Freelance 
resources are limited at the present time and a call 
for tenders is envisaged to set up a network of 
post-editors in order to intensify and promote the 
service. - 

What do we ask of our freelance post-editors? An 
MT user is not looking for perfection. He needs 
information in another language quickly. What the 
post-editing service offers is not really translation, 
nor indeed revision. The free-lance post-editor is 
simply aiming to produce a grammatically correct 
text without undue attention to stylistic detail. The 
amount of correction depends on the skill of the post- 
editor, but above all, on the quality of the raw 
translation, which can vary considerably with the 
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staff in the various departments to help themselves to 
machine translation as and when they needed it. MT 
complements, rather than substitutes the service trans- 
lators provide. A clear distinction is drawn between 
the product of the machine and the work of the 
translator. All texts which are legally binding, or 
which are meant for publication must remain in the 
hands of the professionals. Machine translation can 
be entrusted with only short-lived documents de- 
signed, say, for information or preparatory work, and 
which are required urgently. 

The Translation Service already faces a relent- 
less increase in the volume of its work. If the output 
of Systran in 1993 was 120,000 pages, that of the 
translators was a million pages. And the service will 
have to cope with the new languages of an enlarged 
European Union. But it is not only the language 
service which is under great pressure. Multi- 
lingualism is a permanent feature of the daily grind 
in the various directorates-general. Say, for example, 
there is a British official whose French-speaking 
boss requires him to read, even draft, documents in a 
language other than his own. A memo is needed that 
day and he would have to wait his turn in the queue 
for that excellent job from the Translation Service. 
But help is at hand from Systran, which can give him 
a rough-and-ready version of his text in an average 
time of six minutes. MT does fulfil a useful role here 
as a makeshift solution when deadlines are tight and 
there are last-minute documents to produce, some- 
times in several languages. Then, take the case of an 
official who is presented with a lengthy document in 
an unfamiliar language. Systran can quickly give 
him a broad view of its contents to browse over. 
According to his needs he may then ask for a profes- 
sional translation of the whole paper or of only a few 
paragraphs, or he may discard it altogether. In this 
way the Translation Service is saved much unneces- 
sary work. I will have something to say, later, about 
the use of Systran by the translators themselves. 


Post-editing service 

Now that machine translation is so freely available, it 
becomes essential to monitor its use and provide 
appropriate backup measures. Special care has been 
taken in all publicity material to stress that machine 
translation should be reserved for ephemeral docu- 
ments and is to be considered as a stopgap solution to 
language difficulties encountered in day-to-day work. 
Inevitably, the unthinkable happens. À raw translation 
slips through the net and finds its way onto a Direc- 
tor's desk in the form of an official document. To 
facilitate immediate identification of a raw text a 
warning message ‘!!RAW MACHINE TRANSLA- 
TION!!’ now appears every 300 words or so in the text 
to lessen the risk of any confusion. It is also recom- 
mended that when a post-edited document is used for 
further distribution, a header is inserted drawing the 
reader's attention to the fact that he is reading the 
humanly revised output of a machine. 
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ASLIB KNOW HOW GUIDES 





© PRACTICAL CLEAR AND CONCISE @ ANSWER THE BASIC QUESTIONS @ HOW-TO COMPANIONS FOR THE PROFESSIONAL ® VALUE FOR MONEY 6 ESSENTIAL FOR THE BEGINNER 





A series of short practical guides on how to deal with, and resolve, issues of current interest or concern to those 
working or teaching in the field of library and information services. Prepared by practitioners who are experts in 
the field, the guides offer a ‘how to’ approach based on current good practice which has been proved to work. 


STRATEGIC PLANNING FOR LIBRARY 
AND INFORMATION SERVICES SHELA CORRAL 















ug Y aa This guide applies strategic management concepts and 
pad tione techniques to library and information services It covers 
Ser s the key steps in the planning process from mission 
siia اس‎ statements through to operational plans. e Introduction 





e Strategic planning in context e Purposes and benefits 
e Management of the planning process e Library 

planning and organizational strategy e Strategic 

planning and marketing strategy e Environmental 
analysis e Strategic focus e From strategy to action 
e Conclusion - documenting the process e Further reading. 
236x154mm; September 1994; vi, 50pp 0 85142 330 2 paperback 


MAKING A CHARGE FOR LIBRARY AND 


INFORMATION SERVICES ۵ 

The Know How Guide Series Editor presents a quick way to 
estimate and set charges for users of libraries and 
information services. e Management and planning implica- 
tons e Brainstorming e The mind map e Training needs 
e The competition e User consultation and input e intemal 


e Manual of procedures e Enquiry record e Terms of business 
e Client communication procedures e Financla! procedures and 
records e Credit management e Budgetary control e Staff 
e Marketing and promotion procedures e Conclusions e Case 
studies. 236x154mm; September 1994; vi, 50pp 0 85142 339 6 paperback 


CD-ROMS: HOW TO SET UP YOUR 
?2 sos, V WORK STATION PHL BRADLEY 

19 ولا‎ From the strategic to the practical this Know How Guide 
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Developments in Systran 





Meetings are held with specific departments to ex- 
plain the importance of feedback and answer any 
questions raised. The help desk also offers scanning 
facilities for those departments which are not so 
equipped. The kinds of problems encountered by 
users range from lost texts, to being unfamiliar with 
electronic mail procedures. 


The users 

When the raw machine translation service was first 
offered for general use on a help-yourself basis, we 
had little idea who was using the system. Requests 
came from machine numbers and cryptic passwords 
which could not be identified. Identification has 
since become much easier and the creation of a data 
base of users has enabled us to keep a close track on 
who they are. There are about 2,000 users, 2096 of 
whom are in the Translation Service and 80% in the 
other Commission departments. Certain departments 
make much more use of MT than others, depending 
on thetype oftheir work and their specific informatics 
environments. Of the total number, 3096 (about 700) 
are regular users, that is they have requested at least 
five translations per month. 

Once users had been identified, the next logical 
step was to ascertain their needs. An in-depth survey 
on the use of machine translation at the Commission 
was carried out. Officials were interviewed person- 
ally using a questionnaire specifically devised for 
the purpose. Data from the survey, over a period of 
twelve months, show that the system is predomi- 
nantly used for the translation of short texts (2 or 3 
pages) and for correspondence, minutes of meet- 
ings, summaries, notes or reports. Interventions on 
the original text are infrequent and are limited to a 
spelling check or the formatting of the document. 
Use of simplified syntax when drafting the original 
is very rare. On the other hand 90% of users correct 
the raw versions, in most cases with the help of 
colleagues whose mother tongue is the target lan- 
guage. Contrary to what we supposed, the vast 
majority of post-edited texts are not limited to inter- 
nal diffusion but are destined for a wider audience. 
It is reasonable to predict that the use of machine 
translation for browsing purposes (i.e. as a reading 
tool) would be much greater if the lesser used 
languages were available as source languages. 

To summarise, raw machine translation has three 
distinct applications within the institution. First, in 
the operating departments, it is used as a translation 
tool, particularly for urgent or short documents 
which cannot be handled in time by the Translation 
Service. In the Translation Service itself, as a result 
of the applications currently being developed, in- 
terest is shifting towards exploiting the system as a 
terminology pre-processing tool. Secondly, it is 
often used in Commission departments as a drafting 
tool. The author requests a Systran translation when 
he or she is required to write in a language other 
than his or her native tongue. Finally, to a limited 
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quality of the source text and its suitability for ma- 
chine translation. If there is one rule which can be 
applied, it is that of economy of means. If the text is 
suitable, the post-editor will choose the most direct 
route, the simplest solution, and resist the temptation 
to introduce his or her own linguistic refinements. 
The job is not translation itself, but it requires an 
experienced, fast and efficient translator, who can 
make a text comprehensible by means of the least 
number of changes. The skill of the post-editor re- 
sides in his or her ability to judge the seriousness of 
mistakes and to determine to what extent they need 
to be corrected. 

In addition to this external post-editing service, 
a limited amount of post-editing is also done within 
the Translation Service itself. In a number of iso- 
lated cases Systran is used as an aid for human 
translators. Suitable documents are identified by in- 
house translators from their own unit's workload. 
An important spin-off from these post-editing ac- 
tivities within the Translation Service is the feedback 
that can be channelled to the MT development 
team. The incorporation of feedback into the Systran 
dictionaries involves a number of stages, the first of 
which is the detection of suitable documents and 
the analysis of Systran translations from the point 
of view of terminology. Those terms which are 
deemed to benefit the entire user community are 
then coded in Systran's general dictionaries with a 
view to obtaining an acceptable quality in the target 
language. Specific dictionaries are then created for 
atypical errors and expressions in relation to the 
general Systran dictionaries, so as not to affect the 
overall stability of the system. And the final stage in 
the process is post-editing proper on PC. The ex- 
periment has been judged favourably by the 
translators involved. Two updates have been made 
to the system to incorporate the feedback of transla- 
tor/post-editors and the improvement in the quality 
of the translation has been judged to be most en- 
couraging. 


Information for users and help desk 

Contact with users in the operating departments was 
made easier by the setting up of a machine transla- 
tion help desk, which also manages the post-editing 
service. It is used a great deal. Good communication 
with users of machine translation is a vital part of 
the operation. Users' expectations of the product of 
the machine should not be unduly high. Users need 
to be warned about the quality of MT output if they 
are to avoid disappointment. Moreover, they need to 
know which type of document lends itself to ma- 
chine translation. Some attempts have also been 
made to introduce the notion of *writing for the 
machine’. So far, these have been limited to recom- 
mendations regarding simplified syntax in the drafting 
oftexts prior to their submission for MT. Straightfor- 
ward directives regarding the formatting and 
transmission of texts have been more successful. 
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and throughout the various Commission departments. 

The new interface, affectionately known as Euramis, 

would offer three distinct products in addition to raw 

machine translation: 

1. identification by Systran of Celex references (di- 
rectives, regulations, etc.) in the source text, 
provision of complete titles in the target 

2. Eurodicautom terminology look-up from text 

3. Eurodicautom terminology look-up from user's 
list of terms. 


1. Celex bridge 
One example of highly successful synergy between 
information tools is the creation of a bridge be- 
tween the MT system and Celex, the multilingual 
data base containing Community legislation in the 
nine official languages. A large proportion of the 
original documents received by the Translation 
Service's planning units contain references to ti- 
tles of legislative acts, such as regulations, 
directives or decisions. Translators spend valuable 
time searching in the Celex database or in the 
Official Journals to check that the title is correctly 
expressed in the appropriate target language. Every 
document in the Celex base has a unique reference 
number, which is the same for all language ver- 
sions of that document. A specific algorithm was 
devised: any references to Community legislation 
contained in a source document are recognized at 
the analysis stage of the MT process; the reference 
number is automatically generated; and a search is 
made in the relevant target version of the Celex 
data base. The correct title, along with its publica- 
tion reference, is then reproduced at the end of the 
raw translation. Hence, a routine has been inte- 
grated into Systran, which makes it possible to 
extract automatically from the Celex base the 
title(s) in the target language corresponding to the 
reference number mentioned in the source text. 
The incorporation of this routine has not only 
proved to be an extremely useful tool for transla- 
tors. It has also opened the door to other ways of 
turning the machine translation system to account. 
Why could the Systran text analyzer not be used 
as a means of exploiting other types of text pre- 
processing? 

The Informatics Department of the Translation 
Service had in fact developed two email based 
servers which provided multilingual services en- 
tirely automatically. One server handled raw 
machine translation requests. The other provided 
batch look-up of Eurodicautom, the Community's 
nine-language Terminology Data Bank, looking up 
lists of terms in a given source language and return- 
ing corresponding terminological data in one or 
more target languages. Both servers were based on 
common principles and a common software infra- 
structure. Consequently, it was a relatively simple 
matter to establish bridges between the servers in 
order to provide new products. 
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extent, because of the specific language combina- 
tions available, it is used as an information tool for 
browsing purposes: the Systran translation is re- 
quested to enable the reader to understand a text 
written in an unfamiliar language. The reader can 
decide to ask for a translator's help with the whole 
or only part of the text, or to discard it if it is not 
relevant. 


Technical developments | 

The increase in the number of languages, texts, and 
specialized areas which the Translation Service must 
cover has led it to seek ever more advanced technical 
solutions in order to fulfil its mission at the lowest 
cost. On the one hand, attempts are being made to 
improve the structure of written communication by 
rationalizing and standardizing original text. On the 
other, the memory, speed and capacity of the ma- 
chine are being placed at the service of officials 
working with many languages. 


Informatics 

The Commission has decided to switch from a 
Unix-based microcomputer network as its primary 
working environment to a personal computer net- 
work. This has meant a lot more work for the 
informatics experts of Systran. Access to the sys- 
tem had been tuned to the Unix word processing 
system and the decision to tolerate two word process- 
ing packages (WordPerfect and Word for Windows) 
within the institution did not simplify matters. Ab- 
solute priority was therefore given to extending the 
range of formats accepted by Systran (WordPerfect 
and WinWord). The interim period has been trying, 
with both Unix and PC word processing systems in 
- widespread use throughout the Commission. Systran 
is particularly sensitive to any invisible codes lurk- 
ing in the texts after conversion and these can cause 
problems of analysis. These procedures are still 
being stabilized and users will have to exercise a 
little patience before a complete service can be 
offered, particularly for Greek. 


Bridges 

Following the switch to personal computers at the 
Commission, a new, improved user interface had to 
be created. From our contacts with potential custom- 
ers of machine translation it was clear that lack of 
familiarity with informatics access procedures to 
Systran constituted a barrier to its use and a user- 
friendly interface for Windows was needed. It has 
been designed to guide the user through the different 
stages of his request. 

At the time various experiments were in hand. 
They explored ways of making the most of the 
machine-translation system by exploiting its poten- 
tial as a text pre-processing tool. It was suggested 
that the new interface could usefully be extended to 
make a number of different multilingual tools gener- 
ally available, both within the Translation Service 
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Systran (of which more), which has significantly 
enhanced the initial hit rate. At the same time refine- 
ment ofthe Eurodicautom batch programs will reduce 
the amount of irrelevant data provided. 


3. Eurodicautom batch queries 

The third option is terminology batch queries. Lists 
of terms in a given source language are submitted to 
the Eurodicautom server which returns the output in 
one or several target languages. All current language 
combinations (72) are supported. For people with a 
more specialized interest in terminology, the inter- 
face will offer filters for queries, enabling the user to 
determine the amount of information that is required, 
such as definitions, references and so on. Subject 
fields can be indicated and the scope of the answers 
can be controlled by selecting the desired level of 
match of text items. 

The embryonic development of the new graphi- 
cal interface, Euramis, continues. It has been available 
for testing among a restricted population since the 
summer. The prime concern of the developers is that 
it should be easy to use. Eventually, when it is on 
general release, it should enable the uninitiated and, 
hopefully, even the computer-shy to visualize the 
various multilingual products offered by the Trans- 
lation Service in an integrated package. 


Importation of Eurodicautom 

Returning to Eurodicautom — here were two rich and 
extensive sources of terminology, Systran and 
Eurodicautom, sitting side by side and functioning 
independently. Surely this was pointless and wasteful. 
Why not enrich Systran with the resources of the 
Community's terminology data bank? But there were 
daunting technical problems in the way of bringing 
the two together. The main obstacles to the success of 
the operation were the three fundamental differences 
between Eurodicautom and Systran dictionaries: 






@ default rules (for word class, 
gender etc.) on the basis of 
existing dictionaries 

@ automatic detection of the prin- 

cipal word in multi-word units 












6 automatic detection of ‘best 
possible' equivalent 

6 remaining solutions will be 

stored in comment lines 


forthe time being: filterbetween‏ ٭ 
Systran topical glossaries and‏ 
corresponding Eurodicautom‏ 
subject fields!‏ 

@ eventually, automatic detection 

of subject field by means of 

statistical methods 
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basic grammatical 
information needed 











choice of one solution 


user-oriented topical 


2. Terminology look-up of text 

One approach seemed particularly interesting. The 
idea was to combine Systran source-text analysis 
with Eurodicautom terminology look-up. In this way 
a system was constructed which identifies possible 
terminology within running text and then provides 
the relevant Eurodicautom entries in one or other 
target language. As an experiment, a bridge between 
the existing Systran and Eurodicautom servers was 
established. The procedure is quite simple and was 
developed entirely from existing possibilities. Take 
an example based on English-French. The English 
text is first introduced into Systran for basic analysis. 
The output from Systran is not, however, any kind of 
translation, but simply a list of English terms which 
have been recognized in the Systran dictionaries, 
following svntactical and morphological analysis of 
the text. The English expressions are then looked up 
in Eurodicautom and the corresponding French data 
extracted. The combined English and French 
Eurodicautom data are returned to the requesting 
user. In short, bilingual terminology can be gener- 
ated automatically from an arbitrary text. The limiting 
factor is the number of source languages that Systran 
can analyze. However, for each of the four source 
languages Eurodicautom can provide eight target 
languages. Consequently a Systran/Eurodicautom 
hybrid can support a total of 32 language combina- 
tions. Automatic terminology look-up can therefore 
be provided for language combinations such as 
French-Danish, which do not exist in Systran at all. 
Initial tests revealed a number of weaknesses. At first 
the Systran hit-rate was too low (not enough poten- 
tial Eurodicautom terminology was recognized). The 
Eurodicautom hit-rate was too high with too much 
Eurodicautom data in output. Finally, the presenta- 
tion of Eurodicautom output needed to be refined. 
These problems have been substantially reduced by 
the coding of Eurodicautom terminology within 





no grammatical 
information available 










several solutions for one 
term in the same subject 
field 


necessary 










domain-oriented subject 


fields glossaries 


۱۰ The Eurodicautom subject field classification is more detailed than the Systran topical glossaries (e.g. one topical glossary 
for Biology, Medicine, Chemistry and Environment, with four corresponding subject fields in Eurodicautom). 
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they may, in fact, use the raw machine translation as a 
means of selecting specific sections of a document for 
further, human translation. 

In short, the priorities for language development 
are threefold: a) consolidation of the three basic 
pairs of the system between the three working lan- 
guages of the institution, and hence, priority given 
to German as a source and target language b) devel- 
opment or acquisition of language pairs with 
non-vehicular source languages into one of the work- 
ing languages (i.e. the one with which it has the most 
affinity) c) promotion of co-financing for non-prior- 
ity languages. 


Linguistic development 

General linguistic development work is always based 
on feedback sent by users via the Systran promotion 
team in Brussels. This systematic work on ‘live 
texts’ is necessary to improve the linguistic content 
of existing language pairs. One aspect of linguistic 
development involves the introduction of terminol- 
ogy specific to individual departments. In addition to 
feedback from regular clients the list of not-found 
words generated automatically at the end of each 
translation is encoded. The second aspect of this 
development work, the improvement of programmes, 
is based on systematic detection of errors in the 
analysis or synthesis programmes. Errors are ar- 
chived according to their type, frequency and 
importance. Development is based on a strict hierar- 
chy. The most serious and most frequent errors are 
dealt with first. Development work varies according 
to the ‘age’ of the couple. The youngest couple, 
German-French, for example, contains enormous 
gaps, both at the level of the dictionaries and at the 
level of the programmes, whereas work on French- 
English, the Darby and Joan of the Commission’s 
system, is concentrated mainly on well-targeted 
texts in fields for which specific needs have been 
identified. The rate of progress of a language pair is 
also dependent on the number of people in charge of 
its development. It is therefore necessary to spread 
as well as possible the available resources according 
to the priorities which have been fixed. In recent 
months priority has been given to the importation of 
Eurodicautom into Systran but once this operation is 
complete there will be monthly ‘low-risk’ updates. 
An acceleration in the rate of updates can increase 
the level of satisfaction of regular users thanks to a 
more rapid adaptation of the system to their needs. 
Every effort is made to maintain close collaboration 
with users. 

With the number of MT requests constantly on 
the increase, some organizational measures had to be 
taken. Every text processed by Systran is now classi- 
fied according to the type of document, the domain 
(according to the Eurodicautom classification sys- 
tem), the requesting server and the number of pages. 
A corpus has thus been created which is updated 
daily. This corpus constitutes a basis for evaluating 
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Comparative tests of Systran dictionaries with 
and without the Eurodicautom entries are being car- 
ried out to measure qualitative improvements in the 
translation of technical texts. Further tests are being 
run at various levels relating to the accuracy of 
linguistic strategies concerning the different lan- 
guage pairs and to general strategies concerning 
dictionary look-up. The procedures will have to be 
refined as errors occur and improvements will be 
part of normal development work. The informatics 
procedures for updating the dictionaries are being 
adapted to the new structures and types of informa- 
tion. During the work on the importation of 
Eurodicautom into Systran, important feedback has 
been forwarded to the Eurodicautom team concern- 
ing such things as missing codes, spelling mistakes. 

The main benefits to be anticipated are, on the 
one hand, improved quality of Systran translations 
for all texts of a technical nature, particularly in 
those fields insufficiently covered by Systran, and, 
on the other hand, the development of Systran as a 
terminological pre-processing service. 


The future 
Language policy 
Let us look to the future, first of all in regard to 
language policy. If we consider the breakdown of 
production for the various language pairs, French- 
English and English-French are by far the most used, 
reflecting the higher quality of these linguistic com- 
binations in Systran. However, the user survey 
revealed that nearly 90% of users wish for an im- 
provement in the quality of German as a source 
language to enable rapid reading. The priority given 
to the other source languages, in decreasing order, is 
Dutch, Danish, Greek, Italian and Portuguese. ' 
What are our priorities for future language 
development in machine translation? Internal com- 
munication is in fact covered by the French-English 
and the English-French pairs, where quality has 
reached an acceptable level, provided the right type 
of text is submitted for processing. Our immediate 
concern is to improve German in the system, both as 
a source and as a target language. German is a 
language in which not all Commission officials are 
proficient and in which there is a great deal of written 
communication. In the longer term, the strategy is to 
reverse the pattern of development of language pairs 
from non-vehicular source languages into the main 
languages of communication within the institution, 
and indeed tolerance of machine quality is highest for 
these combinations. Machine translation should be 
made available from the lesser-known source lan- 
guages into the working language they most resemble 
(from Italian, Portuguese and Greek into French; and 
from Dutch and Danish into English). Hence, brows- 
ing requirements can be met and Commission officials 
who are not fluent in one of those languages can 
obtain rudimentary translations of documents written 
in less widely known languages. As we have seen, 
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2. for documents circulating within the Commis- 
sion, a machine translation system is required for 
understanding, as an aid to drafting and as an aid 
for the preparation of working documents in the 
three working languages (French, English and 
German) 

3. for the preparation of documents to be dissemi- 
nated outside the Commission, here the need is 
perhaps not for machine translation as such, rather 
for tools which would facilitate the preparation of 
high quality, publishable translations. It may well 
be that these different usages might imply the use 
of different types of systems. 


Conclusions 

I should like to conclude with the following re- 
marks. First of all, potential users of MT must be 
able to draw a clear distinction between the product 
of a machine and the work of a human translator. 
They must know which type of text can safely be 
entrusted to the machine. The product of the machine 
is useful for ephemeral texts of an informative nature 
and preference should be given to simple texts where 
certain stylistic rules have been respected. If all these 
conditions are met, the machine will be able to cover 
an ever greater share of the requirements of multilin- 
gual communication within an institution such as the 
Commission. Machine translation has to be pre- 
sented as a rapid communication tool for use by staff 
in the various operational departments and not only — 
as in the past — as a tool for translators. 
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the development of the system. Tests can be carried 
out at any moment on a specific type or domain. 
Some new aspects are being handled by the de- 
velopment team. Until recently it was not possible to 
*teach' the system how to translate sentences which 
occur regularly in repetitive texts. ۸ programme has 
now been implemented, in a pilot version, which 
recognizes fixed sentences and integrates them into 
the Systran output, replacing them by their pre- 
defined translations. Certain types of variation can 
be handled, but the system is not as powerful as 
*standard' translation memories in that it does not 
treat fuzzy matches to the same extent and transla- 
tion equivalents have to be established sentence by 
sentence, In other words, there is no automatic align- 
ment for whole documents. Its advantage lies in that 
it is already fully integrated in Systran. Our ultimate 
aim, however, is to introduce an existing powerful 
translation memory into the translation process. For 
this reason, the pilot version is being used by a 
limited population within the Translation Service. 


Future needs 

The perceived needs within the Commission for 

machine translation fall into three categories: 

l. for documents coming into the Commission, the 
main need is for a system that will allow brows- 
ing, in the sense that a reader should be able to 
follow the general argument of a text with suffi- 
cient confidence to know whether it merits more 
accurate or in-depth treatment 
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It is the quality of the factor inputs of capital and 
labour that matter. These ideas lead to the twin 
concepts of a knowledge economy and information 
capitalism as the paradigms that underlie the behav- 
iour of the new information driven economies and 
businesses. The military are in no doubt about the 
importance of knowledge and information. For 
example information is identified as one of the prin- 
cipal attributes of air power, ‘...air power is dependent 
upon the power of information. Information has a 
value and worth all its own...’ Indeed ‘information 
mastery’ is nominated as one of the main attributes 
of air power and of the modern industrial state 
(Hallion). 

Information mastery is more than an enticing 
slogan, for it is what managers would wish their 
information sources to provide. So it will be interest- 
ing to see how information management relates to 
information mastery, and what might be entailed if 
‘information mastery’ were to become a require- 
ment. 


The operational information domain 
The study of information use in an organization has 
to be utilitarian with a top-down orientation. Infor- 
mation may be compiled for its intrinsic value (i.e. 
_ the love of it) in universities, museums and libraries, 
but information in organizations is required for its 
utilitarian value as a strategic and operational re- 
source. Managers, as well as knowledge and process 
workers (whether human or automated), have to be 
fed with the requisite variety of information so that 
they may fulfil their tasks, control operations, and 


The enterprise as a knowledge economy 

An enterprise exists as a congress of information and 
knowledge: it is the knowledge embedded in it, and 
the processing of information throughout it, that lead 
to the strategy, change and action that convert the 
enterprise into a dynamic reality. Management, how- 
ever else it may be defined, is also the process that 
ensures that information and knowledge are effec- 
tively deployed in an enterprise's interests. In this 
the manager is joined by all the knowledge workers 
in the organization —the designers, researchers, engi- 
neers, trainers, administrators, accountants — who 
provide the enterprise with its collective brainpower. 
That brainpower, focused on the wellbeing and suc- 
cess of the enterprise, is the true generator of value 
added and must therefore be the chief asset of any 
business or institution. 

The realization that knowledge and information 
provide the fundamental drivers of economic growth, 
whether at national or company level, is beginning to 
permeate economic and management thinking. It 
comes from the observation that conventional ideas 
based on the old definitions of capital and labour do 
not explain the explosive and sustained growth of the 
‘Pacific Tiger’ economies and the new information 
based businesses that don’t make tangible widgets. 
The accent in economic growth theory is now on the 
‘human capital’ that applies knowledge and informa- 
tion to provide the competitive advantages that come 
from the generation of (i) technical changes that lead 
to new investment and commercial success, and (ii) 
governmental and international institutions that en- 
courage the unfettered flow of trade and information. 
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reporting < distribution to users, and (ii) the acquisi- 
tion ofthe requisite variety of information that supports 
the operational requirement. That term ‘requisite vari- 
ety’ is important as it means that all the information 
necessary for the knowledge workers to be able to deal 
effectively with all the variables and uncertainties in 
the organization being managed (including themselves) 
is available and on tap. It also implies that managers 
use all the requisite information. 

In general the requisite variety of information 
supply for ‘information mastery’ is that which will 
enable the managers to understand the organization, 
manage and control its processes and operations, 
make suitable inferences and strategy from surveil- 
lance of the external environments, deal with 
uncertainty and risk, and come to optimal decisions 
within a broad set of values. The requirement may be 
divided into four broad categories, e.g. 


contribute towards the achievement of the organiza- 
tional objectives. Thus the starting point for an 
information audit is to establish the ‘identity of the 
enterprise' in terms of the enterprise's objectives. 
The consequent total information requirement may 
then be expressed in terms of the needs of the manag- 
ers, systems and processes, and so on through the 
information value chain and the flow lines of infor- 
mation delivery and systems, ending up with an 
assessment of the adequacy (value added) of the 


‘formal information management systems. 


This utilitarian perspective generates a concep- 
tual model in which information handling and 
knowledge. work provide links within a cybernetic 
management cycle. 

Information handling covers (i) the cost-effective 
deployment of the processes of information manage- 


ment, i.e. acquisition > categorization > storage > 
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(Davenport 1993). It is the human client who deter- 
mines what information is actually used — and what 
is rejected or ignored — depending on how structured 
the client's function is within the organization, on 
the immediate urgency, on the quality of formal 
information provision, and on the prevailing cultural 
attitude to information and reflection. Requisite vari- 
ety tends to get lost inside this jumble. 

Managers and in-house information providers 
continue to complain about each other. Information 
delivered to the user is, for example, ‘too late, too 
little, too much, inappropriate, unfocused’. On the 
other hand providers worry about the narrowness of 
the users’ information base distilled only from ex- 
ecutive summaries, some journals, short scans of the 
media and the gossip network. If such scenarios 
prevail they mean that management does not ascribe 
much value to the content and modes of delivery. 
And perhaps the customers are right. However much 
and however excellent the formal delivered informa- 
tion may be there is a strong risk that it will not be 
used unless it also meets the user’s current cognitive 
needs. This must mean that the formal information 
offers less (perceived) value added than that already 
grazed from an informal network with the gaps being 
filled in by guesswork. 

This familiar argument suggests that there is a 
mismatch at the interface between information pro- 
vision and the needs of the human cognitive processes. 
Or, to put it another way, the design of information 
management systems must consider the best way of 
presenting information to the thinking brain as well 
as trying to ensure that the requisite variety of infor- 
mation is available. Information systems are primarily 
human systems with professional and computer rein- 
forcement. 

Perhaps this mismatch goes some of the way to 
explaining the low rating of the value of informa- 
tion: if managers and decision-makers feel that they 
can safely ignore much of what is offered to them 
from formal information sources, they are effec- 
tively saying that those sources do not provide 
much value added. Focus the argument now on the 
collector box for value added in the centre of figure 
2. Of the four feeds into it only that from the 
operating processes has a real (tangible) component 
accumulating from the cash flow generated. The 
other three feeds all derive from information and 
knowledge in one way or another. That from the 
formal information handling systems arises because 
the systems have gathered information and made it 
available (intrinsic value). The value of organization 
comes from the relative effectiveness with which 
the organization as a whole responds to its direction 
(enabling value). Last, the value of the cognitive 
system is a measure of the effectiveness of manage- 
rial thinking, methodology and enterprise. The entire 
value of the enterprise can be credited to this locus 
as it is instrumental to the creation, success and 
welfare of the organization as a whole (instrumen- 
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However there is never enough information: 
the fog of war applies as much to business man- 
agers as to military commanders. So ‘information 
mastery’ also implies the admission of informa- 
tion incompleteness and the consequent need for 
the effective management of uncertainty, risk, 
intelligence and forecasts — which is knowledge 
work. 


The Informatic System 

Structure and value 

Looking at that first conceptual diagram it is appar- 
ent that it has two parts: an informatic part and a 
managerial part. These two parts can be formalized 
into system groupings as shown in the next figure. 
The accent is on the informatic system covering 
information handling and knowledge work. 

Now the human element has entered into the 
scheme, bringing with it the personal idiosyncrasies 
with which humans handle information and think. 
The Informatic System is depicted with three opera- 
tional processes (information handling, filtering and 
thinking), and another environment has appeared - 
the cultural environment that conditions the way 
people think and come to decisions. 

The Informatic System contains two parallel sets 
of information handling and filtering processes: one 
exists as formal components of the real organization, 
the other process is the ad hoc informal and personal 
‘system’ operated by every human. The formal proc- 
esses are such entities as MIS-EIS, information 
centres, research groups and the various (analytical) 
teams that provide information on the organization’s 
state and progress. Between them these formal sys- 
tems attempt to provide the requisite variety of 
information, although their performance and cover- 
age may be very variable in practice. In addition 
every human manager and knowledge worker has an 
informal personal network for ‘keeping in touch’, 
e.g. encounters and meetings, phoning around, scan- 
ning a few selected news media and journals. 

A potential fault line exists between the formal 
and personal processes because they compete for an 
individual’s attention. The competition occurs be- 
cause, in the informatic context, there are only two 
channels of communication into the human brain — 
eyes and ears; consequently all formal information 
and analysis has to compete with any other informa- 
tion acquired via the user's personal system. Even 
automated information processing, following its 
specified electronic pathways, is still likely to be 
offered to a human client via a computer screen or 
printer (unless the context is a process that is com- 
pletely automated from beginning to end). 

There is considerable anecdotal evidence to sug- 
gest that in the less structured functions of 
management (i.e. higher up the chain) managers rely 
more on their personal networks than on formally 
delivered information — anything up to 85% of input 
can be grazed from the ad hoc personal system 
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ple data feeds), Thinking (machine intelligence), 
Monitoring (automatic inspection and MIS). Con- 
nections are made to the outside world via telecoms 
and the private networks. 


The informatic chain is broken at the communi- 
cations interface between the Cognitive System 
(brain) and the external world. This is the point 
where information of any kind has to be scanned by 
the user and accepted as a perception by the brain, 
where it may or may not be absorbed, depending on 
how interesting the content is. This is where the 
design mismatches occur between information pro- 
vision/presentation and the informatic support 
required by the thinking human brain. This is where 
the assumptions made about the rational model of 
human decision making begin to be questioned. It is 
here that the design of printed and pictorial represen- 
tations of information become important, and where 
the design of the human-computer interface (HCI) 
becomes a key factor for information mastery. This 
is where studies of human decision-making behav- 
iour and cognitive psychology should be dragged in 
to become important aspects of information system 
design and management. 


An agenda for information mastery 
Themanagerial requirement for the informatic system 
is to: 
provide satisfactory procedures of acquisition, 
surveillance, intelligence, processing and deliv- 
ery thatare matched to human cognitive action to 
ensure that all that needs to be known to produce 
cost-effective strategy and operation is known 


tal value). It is worth noting that the intrinsic and 
instrumental value components originate within the 
Informatic System. 

Clearly information mastery will be a major fac- 
tor in the effectiveness of the Informatic System and 
of the consequent value of the enterprise. But value 
will not be realised in full if information is less than 
fully mastered — which will be the case while that 
mismatch occurs between information delivered and 
cognitive needs. 


The domain for information mastery 

Information mastery involves more than good infor- 
mation management: it also encompasses human 
networking and thought, and is reinforced by the 
effectiveness with which IT, computers and telecoms 


are used to reinforce information management and 


human cognition. But the key to information mastery 
lies at that interface between information systems 
and cognitive systems — which is a function of 
information management. This is demonstrated in 
Figure 3. 

The Informatic System is exploded in Figure 3 to 
show the interactions between information manage- 
ment, human networking and cognition, and the 
integrated technological domain of IT, computers 
and telecoms. The fundamental cyclic process of 
Informatics is shown in abbreviated form: Acquiring 
> Reporting > Filtering > Thinking > Transmitting > 
Monitoring >> and repeat. The technological do- 
main can support or reinforce this cycle via the 
deployment of various computer aids: Acquiring 
(databases), Filtering (intelligent scanning of multi- 





Figure 3: The Informatic System (2) The domain for information mastery 
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Delivery involves optimal filtering, layering and 
scheduling between 
(1) direct feeds to the agent; 

(ii) intelligence systems supplying surveillance, and 
consequential analysis and interpretation of 
operational, tactical and strategical trends — and 
thence to the agent; 

(iii) support systems that undertake detailed model- 
ling and analysis of the overall problematique 
(workstations with matching analytical support, 
and professional analytical groups working off- 
line); 

(iv) current and professional awareness to update 
the agent's mind-set. 

Each of these layers has a function and time scale 
of its own which should b2 scheduled appropriately 
in a well integrated system. 

The presentation of information should: 

(a) match both the managerial requirement and the 
mental tasks deployed at each stage in meeting 
the requirement; 

(b) be presented in a manner that encourages hu- 
man intuition, supports ratiocination, and 
integrates the two (i.e. the left and right lobe 
modes of thinking are integrated). 

۸ good match between delivered information 
and the cognitive requirements of thinking will come 
when information science and cognitive science can 
be pulled into a common framework. But that is 
easier said than done because at present we can only 
see into the human brain through rather dark lenses 
provided by many jostling research disciplines such 
as psychology, neuroscience, cognitive science, the 
decision sciences, information science, semiotics, 
graphics design and HCI design. 

Cutting bravely through the research thickets the 
main mental actions involved in managerial thinking 
may be tentatively named zs: Wishing > Retrieving > 
Structuring > Transforming > Imagining > Evaluat- 
ing > Judging > Storing. These elements have a 
sequential feel about them — the cognitive process — 
but they are embedded in a complex network of 
jumps and cycles as the thinker alternates between 
soft and hard modes, 1.6. between intuition and rea- 
son. The more ad hac the process the more intuitive 
(and faster and casual) is the thinking; the more 
structured the process the more it is being aligned 
with the formal (and slower and thorough) model of 
rational problem solving. What happens in practice 
is probably akin to an internal dialogue between 
reason and intuition as the thinker tries to give struc- 
ture to the initial jumble, reduce the uncertainty, and 
focus on an emerging resolution that satisfies con- 
flicting desires and interests. 

The information delivered to the thinker should 
match the general and sequential requirements of the 
cognitive process: it should encourage the creative 
insight and help to test that insight against reasoned 
models of behaviour and probability. If the delivered 
information does not seen to help the thinker it 1s 
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and converted into efficient action which all means 

that the value added by the informatic system 

must be optimal. 

This rather idealistic statement becomes an 
informatic requirement if information mastery is the 
aim. The requirement for the ideal Informatic System 
that will deliver information mastery may be outlined 
in terms of six objectives: the first four define the 
technical requirement, the fifth focuses on the obliga- 
tion for managers to see themselves as active agents of 
information mastery, the last addresses the question of 
sensing the consequential value added. 

1. Ensure the requisite variety of information acqui- 
sition and use 

2. Match the delivery of information to the cognitive 
need 

3. Support and complement the cognitive process 

4. Provide means for the easy interpretation and trans- 
mission of cognitive action 

5. Ensure that management is fired by the desire for 
information mastery, and is equipped with the 
mental skills for its delivery 

6. Integrate the system overall to deliver optimal 
informatic value added. 
Each of these objectives brings its own set of 
design and implementation problems: 


1. Requisite variety and information overload 
The supply of the requisite variety of information is 
likely to lead to information overload unless the 
information is properly filtered, prepared and pre- 
sented to the user’s cognitive interface. Otherwise 
the user is likely to reduce the bandwidth of the 
input to what seems to be immediately useful and 
supportive of hunches. The Information Audit can 
be deployed to define requisite variety in organiza- 
tional and operational terms and then to evaluate its 
impact. 


2. The cognitive interface and information 
compression 

Human limitations for the acceptance and compre- 
hension of multiple data feeds introduce a critical 
bottleneck in the informatic system. This leads to the 
problem of compression whereby the meaning and 
implications of the information are transferred some- 
how in full without reduction or simplification. The 
design of information delivery and display is a criti- 
cal matter and should extend to include all aspects of 
information scheduling and presentation, the HCI 
and multimedia methods. 


3. The cognitive process and information 
matching 

After acquisition the informatic system has several 
tasks related to the filtering, preparation, scheduling 
and presentation of information to the human agen- 
cies. These tasks may be thought of to good effect as 
matching information to the human cognitive need 
(rather than just dumping packages on desks and 
screens). 
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5. Management and information mastery 

The human occupiers of an informatic system are 
individual managers and knowledge workers with 
their personal mind-sets, not an abstract *manage- 
ment'. Consequently Objective 5 spotlights the obli- 
gation on the humans inside an informatic system to 
equip themselves for delivering information mastery. 

Information mastery is unlikely unless managers 
are fired with enthusiasm for it, and are equipped 
with the skills necessary to deploy the informatic 
system's capabilities to full effect. It is a joint effort: 
a perfect informatic system in machine terms will be 
ineffective unless the human agent's want it and use 
it. The managers' brains are inside the informatic 
System, consequently the design necessarily involves 
cost-effectiveness tradeoffs between human and non- 
human attributes. 

This means that managers may require both 
indoctrination on the need for information mastery 
and some training in what might be termed best 
practice thinking and information handling. But the 
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The change might be induced by referring to the 
manager's role as the most important part of the 
corporate informatic system: the manager is a neces- 
sary part of it, and its deliverables are only as good as 
the cognitive skills put into it. The system is not just 
IT that can be subcontracted to IT professionals. 
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unlikely to be used except for facts and figures that 
can be quickly assimilated. 


4. Cognitive action and communications 


The results of cognitive action have to be communi- 


cated, via output devices across the interface, into 
the informatic system itself, and outwards into the 
organization and external world. Internal communi- 
cations to the various information sources take the 
form of queries, interrogations, suggestions, orders 
and a formal audit trail (records and memory): they 
establish the vital feedforward and feedback links 
required of an effective system. The requirements 
for each information channel should be matched to 
the best means of closing the loops to ensure speedy 
and correct access without interfering with the hu- 
man agent's continuing thought processes. 

A pictorial summary of an organized informatic 
system is shown below in figure 4. It suggests the 
main components of an informatic system and their 
relationship. 
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Figure 4 The organized informatic system 


attitude change should not be attempted from the 
conventional standpoints of technology-driven IT 
. or methodology-driven Information Science. 
Techno-phobia might be eliminated by the excite- 
ments of, and the involvement with, information 
mastery. 
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An agenda for information mastery has been 
summarized. There is nothing particularly novel 
in this short paper: almost every point made in it 
has been mentioned elsewhere in the information 
literature. But what may be new is the suggestion 
that information mastery is the underlying aim of 
human business and organizations and that its 
possession is a Critical factor for success. 
Information mastery is the emergent property that 
occurs when IT, information management and 
human cognition all work together within a well 
tuned system. 

All that is left to do is to change the culture and 
to get on with it! That will happen (i) when 
information delivery meets information mastery 
requirements, and (ii) when the vital nature and 
value of information is fully appreciated at execu- 
tive level. 
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6. Informatic systems and value added 

It should go without saying that effective business 
management requires information mastery, as well 
as good strategy, organization, cost control, 
competitiveness, customer friendliness and so on. 
In fact all the attributes in that long tail of desirable 
business objectives depend on information mastery. 
However nothing much will happen about the 
explicit implementation of business information 
mastery until the management culture fully 
appreciates the value of information to themselves 
and to their businesses. The military appreciate 
this value (because if they don't, they're dead) and 
devote large budgets to their C31 systems and 
to what might be called information warfare 
(C3I = Command, Control, Communications and 
Intelligence). 

It is quite easy to point out the consequences of a 
lack of information: just take Figure 4 and start 
removing the feeds from the information resources 
to the cognitive system: with these removed just how 
long will the business or the manager survive? That 
is the real value of information-in-general: in mon- 
etary terms it can be equated to the long-term survival 


and earning power of businesses (long-sighted share- - 


holder value). It is similar to the military argument: if 
you don't have it — your side loses the battle, and you 
are dead. This is an emotional argument, but it is 
worth driving home. 

The argument can be supported by looking again 
at Figure 4 and noting that the value added covers 
everything from information acquisition to the out- 
put from managers’ brains. The costs of all the 
components in the informatic system are covered: 
computers, databases, networks, special software, 
knowledge workers, and managers. But the output 
from this system configures the strategy and 
organization, finds the resources, and sets up and 
drives the operations. All of this happens before any 
positive cash flow is generated. But even more im- 
portant: it is the quality of the informatic system that 
determines the business's competitiveness and long 
term survival. 

A detailed accounting of the value of the 
informatic system may be envisaged resulting from 
an extended information audit which covers the 
informatic system as a whole and assesses its im- 
pacts with respect to the requirements of information 
mastery. Such an audit would include explicitly the 
quality of the match between (i) the delivered infor- 
mation and the cognitive needs of the managerial 
functions in place, and (ii) the adequacy of manage- 
rial mind-sets with respect to information mastery. 
Apart from the problems of carrying out a full ac- 
counting of information value, there may be political 
danger if managers feel uncomfortable about their 
personal performance becoming accountable inside 
an informatic value chain. But that fear will largely 
disappear if the culture embraces wholeheartedly the 
need for information mastery. 
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must be ready to seize them. Otherwise, we will 
weaken our ability to enrich the quality of business 
practice in the future. Plus ca change, plus qa doit ۰ 
changer! 

In the later 1990s, this paradox will come into 
sharper focus as new ways for people to correspond 
and converse with one another come into widespread 
use. The marriage of the computer with telecommu- 
nications means that more information will move at 
electronic speeds and reach vast audiences. One 
effect will be to decentralize power as it decentral- 
izes knowledge; another, to create an uncertain 
balance between the technical and the social, be- 
tween the inexorable and the immovable. On the one 
side, there are the forces driving, and driven by, 
technological change; on the other, there is an aver- 
sion to undue risk, an aversion deeply-rooted in 
human values. From this friction comes the diver- 
gent business structure illustrated in Figure 1. 
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How should we respond to the challenge implicit 
in this order of affairs? To a degree, our activities can 
continue along proven lines. In the years ahead, our 
trades, crafts and professions can still draw upon the 
research that we and others carry out, and the experi- 
ence we gain. Our purpose can still be to make 
people aware of what we find and learn, and then 
help them to put the results to good use. Our product 
will still be knowledge, and our purpose its commu- 
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Paper to the FID H Congress, Granada, November 1 


In the years ahead, we will have to answer the 
questions implicit in the title to which FID has asked 
me to speak. How do you find out what information 
business managers need? And how can we best fulfil 
these needs? Let me start by defining what we are 
about in the practice of information as a trade, a craft 
or a profession. 

Our principal raison d’être is to induce worth- 
while innovation. When we succeed, we will see 
improvements in business practice, but we also have 
to allow for an important side-effect. The develop- 
ments we observe and help to introduce often have 
an influence on what we ourselves do, and how we 
do it. Just as people must adapt to new conditions as 
they arise, so must we. And we have to anticipate 
changes that will affect our work, and decide how 
best to cope with them. When these changes outmode 
our existing methods, we must be ready to abandon 
them; when change offers new opportunities, we 
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Technology 


Nature of 
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Figure 1 


The diagram! suggests that there is a danger of 
the community we serve dividing into business man- 
agers whose lives are dominated by the factors on the 
left, and those whose lives are dominated by factors 
on the right hand side. In the extreme, this could lead 
to a business community of ‘haves’ and ‘have-nots’; 
or, more precisely, a community split between those 
with plenty of resources and no time, as opposed to 
those with plenty of time and no resources! 


technology, old-fashioned experience, and increas- 
ingly, from information on record. Altogether, this 
knowledge represents intellectual capital. Pope John 
Paul II identified it in his last-but-one encyclical, 
where he defined a new, important form of owner- 
ship and wealth: ‘the possession of know-how, 
technology and talent.’ 


Development of an information economy 

Over the last few decades, we can see that the infor- 
mation revolution has begun to change the very 
source of wealth. It is no longer material, it is knowl- 
edge applied to work to create value. The pursuit of 
wealth is now largely the pursuit of information, and 
the application of information in business and to our 
collective affairs. 

Today, everyone is conscious of the growing 
economic and social significance of these intellec- 
tual assets, but there are no audited appraisals of their 
valuc. To clarify the point. just suppose all the soft- 
ware that runs large computers suddenly vanished. 
All the lights would go out; all the airlines would 
stop flying; all the financial institutions, and many 
factories, offices and 120022101۲165, would come to a 
standstill. Yet these crucial intellectual assets do not 
appear in any substantial way on the balance sheets 
of the world. Those balance sheets are still full of 
what the industrial age called ‘tangible assets’: build- 
ings and machinery, and stocks of goods and 
materials; things that accountants can see and touch. 

The new information economy changes the very 
definition of an asset, transforms the nature of wealth, 
cuts a new path to prosperity. It changes everything, 
from how we make a living to how and by whom the 
world is run. The competition for the best informa- 
tion is vastly different from the competition for the 
best farm land or the bes: coal fields. Information 
resources are not bound t5 a particular geography, 
nor easily controlled. An information economy di- 
minishes the rewards for control of territory and 
reduces the value of the resources that can be ex- 
tracted through such control. 

Businesses and institutions and nations that capi- 
talize on information will be. vastly different from 
those that once vied mainly for material resources. 
As a source of wealth, information comes in various 
forms, from streams of electronic data briefly valu- 
able, to years of accumulated research, stored in 
libraries, embedded in computer memories, or car- 
ried as intellectual capital in the collective minds of 
specialists. 

These developments ratse difficult questions. How 
will we measure capital fcrmation, when much new 
capital is intellectual? How will we measure the 
productivity of knowledg2 workers whose product 
cannot be counted on our fingers? If we cannot do 
that, how will we track growth in productivity? 


Capitalizing on our intellectual assets 
In many organizations, collective knowledge can be 
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nication. All that will persist but, as technology 
develops, businesses can easily become data-rich 
and information-poor — engulfed by megabytes of 
disconnected facts, findings and statistics. We must 
avoid that data overload: it will be at odds with our 
mission. 

In an economy where the only certainty is uncer- 
tainty, the one sure source of lasting influence is 
knowledge. When markets shift and technologies 
proliferate, when activities multiply and become 
obsolete almost overnight, successful institutions 
are those that consistently create new knowledge and 
effectively communicate it. These functions define a 
business as a ‘knowledge-creating’ agency, whose 
mission is continuous innovation. 

In that capacity, we are as involved with ideals as 
we are with ideas; the essence of innovation is to re- 
create a changing world in accord with a particular 
vision or ideal. To that end, we require an informa- 
tion strategy that will enable us to redeploy our 
resources and redirect our efforts, whenever neces- 
sary. We require a strategy that will enable us to 
refresh our knowledge, and communicate it, against 
a shifting backdrop of technical advance and social 
adjustment. That presents us with a new challenge, 
not merely to increase our efficiency but to capitalize 
on our intellectual assets, so that we can deepen our 
vision. 


Our intellectual capital 

Advances in technology are changing the way peo- 
ple work, and indeed the nature of the work they do; 
changing how individuals and groups live, work and 
interact with others; changing the basis for differen- 
tiation and competition, Around the world, many 
businesses now find that they can use the knowledge 
they possess to secure a differential advantage. These 
organizations are becoming players in the global 
market for information, which has moved from rheto- 
ric to reality almost before we knew it. To profit from 
our intellectual assets, we must enter this market. To 
do so, we need first to understand its character and 
the mainsprings of its growth. 

To an extent, competitive differentiation will 
revolve around an intensification of analysis. The 
astute will shift their attention from systems to infor- 
mation; and they will address two related questions. 
In a competitive world, where groups can have ac- 
cess to the same data, who will excel at turning data 
into information? Who will then analyse that infor- 
mation quickly and intelligently enough to generate 
superior knowledge? As answers to these questions 
emerge, the pace of change will quicken. Before this 
decade ends, the nature of information, how it is 
traded and produced, the scope, shape and protocols 
of information markets, and the other attributes of an 
information economy, will impact policy, set limits 
on influence, and redefine power. 

The knowledge we can deploy will come from 
the products of analysis; from research, learning, 
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through many sources. Many researchers are trying 
to overcome these limits with new models for infor- 
mation-rich systems. They recognize that the rules 
and customs, the arts and talents necessary to un- 
cover, capture, produce, preserve and exploit 
information are now mankind's most important 
rules, customs, arts and talents. 

One useful model assumes that the exemplary 
system closely resembles a symphony orchestra.‘ 
All the instruments play the same score, but each 
plays a different part; they play together but they 
rarely play in unison. There are more violins than 
horns, but the first violin is not the ‘boss’ of the 
horns, nor even the 'boss' of the other violins. 
Within a set programme, the same orchestra can 
play many different pieces of music, each entirely 
different in its style, its scoring and its solo instru- 
ments. 

An information-intensive structure permits, and 
may require, 'soloists' with many different 
specializations in all areas. These soloists function 
like a pianist playing a Beethoven concerto. Both 
the ‘pianist’ and the ‘orchestra’ around him — that 
is, the rest of the organization — can function only 
because both ‘know the score’. It is information not 
authority that enables them to support each other. 

In the orchestra, both the players and the con- 
ductor receive the score in advance. In an 
information-rich enterprise, the ‘conductor’ and the 
‘instrumentalists’ extemporize the ‘score’ as they 
play it. To know what the score 1s, everyone in the 
organization has to manage by objectives that they 
clearly understand, and agree in advance. There has 
to be shared understanding, shared values, mutual 
respect. 

Management by objectives and self-control are, 
of necessity, the integrating principles of the infor- 
mation-based structure; they rest on responsibility. 
The information flow is circular, and the system can 
function well only if each individual and each unit 
accept responsibility for their goals and their 
priorities, for their relationships, and for their com- 
munications. 

Tight self-discipline, in turn, allows fast deci- 
sions and quick responses. Advanced information 
systems permit both great flexibility and far greater 
diversity and make it possible, for instance, to have 
parallel streams of activity within the same corpo- 
rate structure. It can contain purely managerial 
units, that attempt to optimize what exists, and 
entrepreneurial units, charged with making obso- 
lete what exists and with creating a different 
tomorrow. 

In these units, each individual and group have to 
ask: ‘What should the organization expect of us, 
and hold us accountable for, in terms of perform- 
ance and contribution? Who in the organization has 
to know and understand what we try to do, so that 
both they and we can do our work? On whom in the 
organization do we depend for what information, 
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hard to identify, and harder still to plot effectively; 
the management of intellectual capital is still largely 
uncharted territory, and few are expert at navigat- 
ing it. Managing know-how is not like managing 
cash or buildings, yet intellectual investments need 
to be treated with at least as much painstaking care. 

In this process we can learn from techniques of 
knowledge creation and management pioneered by 
the Japanese. They have shown that genuine knowl- 
edge is seldom structured in fixed fields or by 
dependable rules. It is social — that is, it is distrib- 
uted as shared understanding among people who, 
contrary to their national culture, may have little 
respect for artificial organizational boundaries. In- 
novative managers in Japan’s most successful 
keiretsu recognize that they need new tools to ar- 
ticulate, organize and share such bona-fide 
knowledge. Living knowledge will emerge when 
people can connect their perspectives and defuse 
their collective prejudices, their blind spots and 
their unfounded assumptions. 

This collaborative exploration cannot be auto- 
mated or controlled by the old military mechanisms 
of organizational management — the hierarchy, the 
chain of command, the delivery of cut-and-dried 
marching orders. Nor will hardware systems and 
factual databases meet the need. Too many organi- 
zations are drowning in a sea of high technology 
without insight or content. Creating knowledge is 
not simply a matter of processing objective infor- 
mation; it depends on tapping the tacit and often 
highly subjective insights, intuitions and ideals of 
individuals and groups. That is our strategic pur- 
pose: to capture, capitalize and lever this 
free-floating brainpower.’ 


An appropriate organizational structure 

The greatest challenge for the manager of intellec- 
tual capital is to develop an organization that can 
share knowledge. How best to do that? The world 
lacks a model that schematizes information’s forms 
and functions. Admittedly, even without such a 
model one point is clear. When the world’s most 
precious resource is immaterial, the economic doc- 
trines, social structures and political systems that 
evolved in a world devoted to the service of matter, 
become rapidly ill-suited to cope with the new state 
of affairs. 

Today, the typical organization still manages 
information very much like a refinery processes 
oil.? Crude comes in and highly refined products go 
out: an oil major's crude product becomes super- 
unleaded; a corporate planning unit assembles an 
analytical database of transactions, and uses it to 
prepare management reports. 

The refinery analogy is easy and apt, the basis of 
clichés about ‘turning data into information’. In 
reality, there are limits to this process. It can be 
difficult to turn raw data into useful facts; often the 
information is unstructured, complex, and scattered 
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their users and feedback from them. We need a 
similar strategy, but not just with users. We must 
give comparable attention to the links between our 
users and our 'funders', and between our funders and 
the providers of our information services. 


Our funders 

۸ business's funders and fund managers include: 

e the shareholders and institutions, which provide 
fixed and fluid capital 

e the board of directors, including non-executives 

e the management team 

e outside suppliers, on whose capacities, skills and 
credit the business may draw 

e those who place awards, grants and contracts. 

Together, they allot funds and allocate resources, 

and they set evaluation criteria and constraints that 

will govern the agreed strategy. The funders and 

fund managers are also concerned with monitoring 

performance in the context of the business's strategic 

and financial objectives. 


Our users 
Users of our information services can include: 


e line and staff functions 

e research managers and research teams 

e customers of the business, and consumer groups 

e organizations and individuals with an academic or 
professional interest in the business 

e thc media. 


Qur information services 

Our services may draw upon information and media 
specialists, and external suppliers of services and 
expertise. Among these suppliers may be publishers, 
database hosts, contractors, agents, consultants, tech- 
nical advisers, and other libraries and information 
services. 
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what knowledge, and what specialized skill? Who 
in turn depends on us for what information, knowl- 
edge, specialized skill? Whom do we have to support 
and to whom, in turn, do we look for support?’ 

When the players can answer these questions, 
an organization can reshape its managerial struc- 
ture around the flow of information. Management 
structures are already subject to root and branch 
change. Layers of management that used to do noth- 
ing but relay information from one level to another 
are beginning to disappear; these positions are no 
longer needed now that information technology al- 
lows the rapid transfer of vital information to all 
levels of management without human intervention. 

The old model of hierarchical organization is giv- 
ing way to flatter structures designed for the faster 
response times needed to meet dynamic challenges on 
an international scale. Admittedly, the new organization 
chart may look perfectly conventional. In truth it is 
not; these new types of organization behave differ- 
ently and demand different behaviour from those 
involved in it. Once this occurs, the *organization of 
the future' can take a practical form in which informa- 
tion serves as the axis and as the central structural 
support. Managers gain better access to other people's 
analyses, which ultimately becomes understanding. 
Translating theory into practice 
A logical way to build understanding is to encourage 
users to participate in the shaping of programmes 
and the communication of results. This collaborative 
approach will become feasible and productive as the 
spread of technology prompts us to take a different 
course. 

In the next five years or so, communications will 
probably make less use of paper products and more 
of electronic services. By their nature, services entail 
greater interaction with users; and most service sup- 
pliers pay close heed to their communications with 
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that, without extensive market investigations, we 
will have only a remote view of real funder and user 
requirements. 

We can, of course, uncover these requirements 
through field studies, sample surveys, personal in- 
terviews and other means, although the methods 
will depend upon the nature and size of the target 
groups. It is worth taking care to select research 
methods; insufficient or faulty market appreciation 
is often the principal cause for disappointment with 
new forms of information service. 


Alliances and partnerships 

Given the complexity, to estimate demand for our 
information services, and explore that demand thor- 
oughly, will call for substantial investment. When 
knowledge is the main ingredient in a service, up- 
front costs tend to be very large; marginal costs, 
relatively low. To minimize our investment, we 
could form temporary alliances, so that we can 
engage with different allies or partners in as many 
kinds of knowledge creation and communication 
activities, as we need and find feasible. 

Prospectively, information technology will en- 
able us to develop new forms of collaboration. One 
of the most intriguing is the information alliance, 
facilitated by the sharing of knowledge over net- 
works. Through an information alliance, we can 
work with diverse organizations, and together offer 
novel incentives and services, or participate in com- 
bined marketing programmes. Jointly, we could 
take advantage of new channels of distribution and 
realize operating efficiencies. 

From our standpoint, alliances and partnerships 
should open opportunities for scale economies and 
cross-selling, and enable us to reach audiences 
otherwise beyond our grasp. In addition, these 
arrangements could provide us with a new basis for 
differentiating our services. 

These relationships are, in effect, inter-organi- 
zational information systems, which are becoming 
popular for various reasons. For instance, busi- 
nesses have found it increasingly difficult to earn 
acceptable marginal returns, simply by investing in 
internal information systems. Internal systems have 
matured, and we must look for new growth paths 
elsewhere. 


Strategies and tactics 

Profiting from information technology 
Transnational organizations are therefore pursuing 
a further line of development, as a key element in 
their corporate strategies. Many are turning away 
from an orientation towards products and produc- 
tion, which depend upon internal-systems. Instead, 
they are adopting a service and market orientation, 
with links to funders and users, allies and partners. 
Consequently, enterprises and organizations are 
increasingly intertwined and integrated, both 
internally and externally, as is evident from the 
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Communications in three dimensions 

Figure 2 illustrates the three-way communications 
process between these functional groups that should 
be at the heart of our strategy. The diagram draws out 
connections that we should try to strengthen. In 
particular, our information strategy should reinforce 
the enthusiasm and support of our funders, and fa- 
cilitate communications between our funders and 
our users. ۱ 

As emphasis shifts from information products 
to information services, we will also need to try 
different tactics. For products, the best approach is 
usually *promotion-oriented'; we endeavour to cre- 
ate awareness of, say, items in a catalogue, or new 
publications, reports and initiatives. For services, 
the most effective approach is 'development-ori- 
ented'; that is, we spend most of our effort on 
discovering the wants of a target audience, and then 
creating information services to satisfy them. 

The first approach assumes relevant products, 
and focuses on communications that inform users 
about them. The second approach puts the empha- 
sis on a joint effort by providers, funders and 
users, to create useful services attuned to demands 
and needs. 


Target audiences and ‘constituencies’ 

When we assess these demands and needs, it will be 
important to distinguish between 'academic com- 
munication' within the research community only, 
and 'extension communication' with non-academic 
professionals, practitioners and social partners. 
Experience suggests that both are essential; in prac- 
tice, communications with extension audiences are 
most effective when they are aware of strong refer- 
ence group support in universities and public sector 
administration. 

Our strategy also has to take account of social 
and technological pressures in a complex environ- 
ment. On occasion, a single new idea generated by 
an individual may cause a radical change; more 
often, we have to work with diverse cosmopolitan 
groups to attain our goals. 

I illustrated the interplay of forces in Figure 1. 
Those pressures create an elaborate 'socio-technical 
constituency', within which we must communicate 
our research findings. Our domain blends the 
technical with the social; the expertize, technology 
and structures with the values and attachments of 
interest groups. Collectively, these constituents 
develop, produce and communicate knowledge on 
particular changes. 

Such groupings are never static; they are always 
evolving and altering their mix in ways which re- 
flect a growth or decline in demand. Their 
interactions may be competitive, and inject a sense 
of urgency and threat; or they can be collaborative, 
or.a combination of both. The inter-relationships 
are many-sided, and the sheer number of interfaces 
in this socio-technological system inevitably means 
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international strategic alliances in new fields, and 
professional inter-organizational networks that 
develop and promote change, to name just a few. 

Informal communications hinge on relatively 
loose, implicit, flexible interrelationships, which 
are essential to the process of creating demand. 
They facilitate interaction and information flow, 
and foster private and social contacts — contacts 
which often go beyond those that involve informa- 
tion. Ideally, the network's pool of knowledge is 
accessible to every user, and allows them to exploit 
economies of scale as they generate, communicate 
and experience information. 

When they decide to buy into information and 
network services that entail a substantial commit- 
ment of time and money, most users and their 
funders prefer personal independent reference 
points, and have more confidence in them. ۸ study 
by Murray’ points to a prominent referring role for 
opinion leaders and early service adopters. 

There is a body of research which suggests how 
we should address this market. We should develop 
persuasive communications that stress productive 
experience more than technical features. We should 
employ *word-of-mouth simulation' in our market- 
ing, selecting spokespersons and endorsers who are 
similar to our target prospects. To stimulate de- 
mand, we should emphasize tangible benefits to 
encourage trial, and we should motivate opinion 
leaders and early adopters to evaluate the quality 
and relevance of our services, and acquire experi- - 
ence of them. Murray's study also underscores the 
need for continuing effort to sustain use of services. 


Physical networks 

The social concept of networking is now almost 
inseparable in some environments from the physi- 
cal networks that support it: local area networks, 
internets and electronic conferencing systems. Mar- 
riages are being announced every day between 
computers, fax machines, modems, cellular phones, 
phone companies, cable networks, and electronic 
information providers. The distinction between hard 
copy and soft copy is rapidly blurring. 

Efficient networks can foster remote conversa- 
tions that are the electronic analogue of the 
proverbial coffee-break discussion, in which seri- 
ous matters are often communicated and sometimes 
even resolved. The network can convey significant 





jargon; usages such as 'outsourcing', 'strategic 
alliances’, ‘joint ventures’, ‘networking’, ‘computer- 
shared cooperative working’ are in almost everyday 
parlance. 

This growing demand for inter-organizational 
contact has been matched by the liberalization of 
the telecommunications industry, and the develop- 
ment of collaborative working tools, like Lotus 
‘Notes’. That has meant lower prices and better 
service — and wider use of hardware, software and 
standards. Organizations desiring to exploit infor- 
mation technology strategically are focusing on 
inter-organizational information systems at the 
expense of internal systems. 

These developments encourage collaborators to 
exchange volumes of electronic data precisely, in- 
stantaneously, and relatively cheaply. There have 
been significant improvements in the price and 
performance of systems delivering electronic 


databases and services to external parties over the. 


past few years. Higher computer speeds and cheaper 
mass-storage devices mean that information can be 
archived, cross-correlated and retrieved as never 
before — and in ways that may be tailored to recipi- 
ents’ needs. New fibre-optic networks are greatly 
improving delivery to remote locations? 

Faster delivery is also stirring interest in novel 
‘just-in-time’ approaches to information storage 
and retrieval. Advocates of JIT argue that records, 
publications and documents held for ‘insurance’ 
purposes remove the incentive to make the best use 
of external sources. Nowadays, our information 
services can rely more upon the vast range of 
electronic information available at a few taps of a 
finger. 

The dynamics of information demand and 
supply are also shifting. Information managers now 
find that, like money in a mattress, intellectual 
capital is unproductive unless it moves. By 
developing ways to make knowledge move, 
an organization can create a value network, not just 
a value chain. 


Informal networks 

An informal network can be an efficient inter-or- 
ganizational set-up for creating a social pool of 
knowledge about prospective change in business 
conditions, or other topics. The network can then 
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and provide prototypes of tools likely to be use- 
ful in the management of knowledge in the field 

e a ‘database of databases’ should function as a 
core reference source, and give access to the 
principal data collections in the area 

e both these developments can benefit from systems 
to tag multilingual texts and data sets, using a 
standard markup language 

SGML, the Standardized General Markup Lan- 
guage, enables the user to describe the essential 
features of simple or complex documents without 
becoming a hostage to the proprietary standards 
imposed by software vendors. ۸ document created 
in or converted to the SGML format can be output 
in many forms on many types of systems. 

Far more important we can describe, or mark up, 
the structure of a document — the logical organization 
of its constituent elements — in such a way that the 
description 1s never lost, no matter what the eventual 
destination of the document. The payoff is informa- 
tion that can be widely shared. 


Monitoring 

Our success will depend on critical factors and key 
activities that must function effectively if we are to 
succeed. We will need identifiable criteria to evalu- 
ate our performance, so that we can take corrective 
steps when necessary. 

For example, we could gauge our performance 
with such measures as 
e the flow of new users, as a proportion of our total 

user population 
e discontinuation of use by current users 
e user-initiated changes to current arrangements 
e disagreements or complaints 
e assessments of the quality of services. 

Clearly, we must also keep a close watch on our 
own costs — and those of our suppliers, allies and 
users — of initiating, developing and sustaining 
services. And we will need rigorous studies of the 
impact of our efforts and the results of our expendi- 
tures. Performance evaluation will need further 
investigation as we develop our information strat- 
egy and mature our opportunities. 

The questions I have addressed in this presenta- 
tion are broad. The answers I have offered may 
seem far-reaching and singular. But what we must 
set about is a dialogue. We must work together to 
understand what information business managers 
need, and how we can fulfil those needs. 

In practice, that will mean that we should be as 
absorbed with ideals as we are with ideas. To that 
end, we will have to redeploy our skills and redirect 
our efforts whenever we need to. We will require a 
strategy that will refresh our understanding, and 
communicate it, against a shifting backdrop of tech- 
nical advance and business response. That will present 
us with a new challenge: we will need to capitalize on 
our intellectual assets as generalists, so that we can 
deepen our vision as specialists. We will need to 
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those costs over time — and partly on the view users 
have of the advantages that participation offers. 
Once participation in a network gains momentum, 
the problem is to translate that momentum into a 
self-sustaining evolution. The question then is: will 
our networks have enough support to function on a 
realistic scale? 

Several factors can lessen this critical mass prob- 
lem. The first is the reputation, complementary 
assets and capabilities of the entrepreneur who cre- 
ates the network. The greater they are, the lower the 
critical mass, and the easier it will be to reach it. 
When these factors are positive, it 1s easier to con- 
vince the initial core of innovative users to form a 
network, despite uncertainty. As membership grows, 
the network can provide a richer range of contacts 
and spread fixed costs to give economies. 

Success can then breed further success. À tradi- 
tion of cooperation among the parties can stimulate 
demand for a chain of new opportunities and net- 
works, so that success at one stage reduces the 
critical mass required for another. The application 
of IT helps to minimize the critical level of effort 
required to materialize new market opportunities. 

In practice, the collective network is like a 8 
mind’ in which personal knowledge and intelli- 
gence come from a single node. And big thoughts 
are brewing there — a kind of massive parallelism 
with a human face. This point is pivotal because 
without social interchange and human connections, 
none of the rest 1s possible; people do not interact 
for very long in fixed-field datapoints. We must 
become part of that social process, and determine 
how best to evaluate and exploit these new tech- 
nologies to increase the effectiveness of our 
communications. 


Packaging 

Effectiveness in the transfer of knowledge from 
origin to use also requires 'packaging' by skilled 
communicators. We must develop a close under- 
standing of our constituencies, and present our 
conclusions in a format and terminology which are 
simple for users to follow. 

When we come to depend on partnerships, we 
must recognize that information has to be packaged 
for all partners by all partners, and that calls for 
agreement on data definitions, formats, relationships, 
structures and vocabularies. We have to define com- 
mon procedures and common standards for systems 
development and maintenance, and develop com- 
mon codes for databases, records and electronic 
communication. We must adopt articulated proce- 
dures for surfacing conflicts, resolving perceived 
inequities; and, when necessary, we should rethink 
the terms of our partnerships. 

Recent developments, as we pursue them, will 
advance these elements in our information strategy: 
e experience with text handling and processing 

will help us to create authoritative vocabularies, 


May 1995, Aslib Proceedings 


4. Proposed by Peter Drucker, the management 
guru. 

5. BROWN, R. (US Commerce Secretary), Chair- 
man Information Infrastructure Task Force report 
Washington DC, September, 1993. 


6. BESSEN, J. Riding the marketing information 
wave,Harvard Business Review, September-October, 
1993, pp.151-160. | 


7. MURRAY, K.B. A test of services marketing 
theory: information acquisition activities, Journal of 
Marketing 55 (1), January, 1991. 


Aslib Proceedings, vol.47, no.5 


How best to find and fulfil business information needs 


strive anew, to reason anew; to settle the doubts of 
today and win tomorrow. For if we do not, who will? 
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Abstract 


This paper examines the prevailing practices of preservation in the Library of the University of Cape Coast, 
Ghana. A brief description of the library, its location and stock is provided. Difficulties faced by the library are 
also discussed. Recommendations made take cognisance of the special problem of finance. The paper advocates 
the preservation of library materials to be seen by the library staff and management as an integral part of library 
practice. To this end, the inculcation of a preservation culture in both staff and users is emphasized. 


added problem of the effects of the salt sea breeze on 
the library's resources. 

The present building, in use since October 1962, 
was meant to be temporary. The University has 
since expanded and the temporary building has 
remained. With the increase in library stock, use- 
able space in the library has grown smaller and 
smaller. The resultant problem is acute lack of ac- 
commodation. Statistics show that the library was 
originally built to house 50,000 volumes.? To date it 
has more that 170,000 volumes. The library there- 
fore houses more than three times the number of 
volumes of its original capacity. Due to the space 
problem, library materials dealing with Science and 


` Agriculture were moved to the topmost floor of the 


Faculty of Science complex. Also a good number of 
back issues of periodicals are housed in a room at the 
Faculty of Arts. This is an obvious indication that the 
library is bursting at the seams. This state of affairs 
certainly has telling effects on library materials. 

The library stock has simply outgrown the li- 
brary. It consists predominantly of paper documents, 
such as books, bound volumes of periodicals, maps, 
manuscripts, theses and photos. A small proportion 
of the library's holding 1s made up of non-paper 
documents, such as photographic prints and films, 
computer records, audio and video tapes. There are 
also electronic gadgets (computers, photocopiers and 
microfilm/fiche reader). All of these need a great 
deal of care and attention as well as correct handling 
in order that the useful life of these items will be 
maximized. 

The building was originally built with central air- 
conditioning. This has long since broken down. The 
effect of the absence of the air conditioning system 
on the stock is obvious since windows are few. 
Ceiling fans have however been fixed to lessen the 
deleterious effects of a jammed library with little air 
circulation. Certain parts of the library have air con- 
ditioning. This is where most of the non-print 
documents and electronic gadgets are housed. 
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Introduction 

Preservation appears to be a term that defies precise 
definition, and its meaning has changed over the 
years. Rose Harvey asserts that ‘publications before 
the early 1780s may not use these terms in the same 
sense as the current terminology does. 

To many librarians preservation is synonymous 
with old books and binding but preservation involves 
far more than that. Preservation is ‘any activity, largely 
preventive aimed at protecting and securing library 
materials to ensure their availability, access and 2 
Preservation would have meaning to librarians/infor- 
mation scientists if we ceased to see itas an antiquarian 
exercise for keeping objects from the past simply 
because they are old. Only then would it be given the 
priority it deserves. In this age of information technol- 
ogy, preservation is better understood to be a 
managerial tool for making information available to 
users. With this understanding of preservation, the 
librarian 1s set to meet the challenges of preserving his 
or her holdings, not only for posterity but for the 
effective dissemination of information now. 

Furthermore, preservation is the conscious effort 
to ensure that materials are always in a state of use 
and that the useful life of each item is maximized. 
Carrying this out includes the consideration of 'stor- 
age and accommodation provisions, staffing levels, 
policies and techniques and methods involved in 
preserving library and archival materials and the 
information contained in them.'* 

It is with this understanding of preservation that 
the writer would like to examine preservation prac- 
fices prevailing in the University of Cape Coast 
(UCC) library and to make practical suggestions for 
improvement. 


UCC Library 

The UCC Library 1s located in a hot, humid region of 
tropical sub-Saharan Africa. This makes it impossi- 
ble for the preservation of its stock to be ignored. 
Since the building overlooks the sea, there is the 
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this fashion, their lifespan is shortened prematurely, 
mostly through physical impairment. A lot of 
creativity and boldness is thus called for to be able 
to make any significant progress towards a preser- 
vation programme. 

Lack of a preservation culture is another prob- 
lem. There is hardly any awareness amongst staff or 
the users about the need for preservation. Until now, 
preservation has not been considered an integral 
part of library practice. The result is apathy and 
ignorance of preservation. The consequent absence 
of its conscious practice is that it is not given needed 
priority. It 15 observed also that no plans or equip- 
ment are in place to avert the ill-effects of natural 
disasters such as floods and fires. In all fairness, the 
difficulties militating against the effective preserva- 
tion of the UCC library stock are genuine and quite 
overwhelming. 


Recommendations 

The writer is of the opinion that despite obvious 
difficulties progress can still be made. For this 
reason, the paper provides workable and practical 
suggestions for implementation. Indeed preserva- 
tion is a very costly business, but certain low cost 
measures can be implemented to great effect. The 
more technical and expensive measures should 
nevertheless be tackled in the long term. À humble 
beginning towards effective preservation of the 
library stock could begin with the library assistants, 
cleaners and janitors, who have direct contact with 
the materials on a daily basis. This 1s all in a bid to 
inculcate preservation in UCC library life. It will be 
helpful too if constant reference is made to it at staff 
meetings.’ 

Looking at the contents of the User Education 
Programme of the library, the writer observed that 
little reference is made to the role of the users in the 
preservation of library materials, particularly the 
handling and care of these valuable resources. The 
orientation given users must be exploited to create a 
preservation culture in them. 

Another avenue which can be used for this same 
purpose is the Library guide. This is a printed book- 
let given to users instructing them on how to use the 
library. It is suggested that specific reference be 
made in this Guide to the need and importance of 
preservation and what part the user of the library can 
play. Library clientele have a significant part to play 
in the preservation of library materials. Users’ ability 
to handle library materials appropriately and care- 
fully would ensure that materials are protected from 
the excesses of physical stress, which careless use 
can impose. 

Shelf-reading once a year has been the practice. 
The writer wishes to recommend that this be done 
once every six months. The library assistants who 
undertake this task should clearly be instructed on 
what to look out for. Until now, shelf-reading has 
been done solely to correct the arrangement of books 
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Prevailing practices of preservation 

As the desire to preserve is instinctive to man, pres- 
ervation practices of some sort are carried out in the 
library. The writer identifies that cleaning is done 
daily. This is not thorough due to shortage of staff 
and lack of modern equipment such as vacuum clean- 
ers. Once a year, thorough cleaning is undertaken but 
this approach is insufficient to take care of the dust 
menace permanently. The age-old practice of the 
binding of worn out and torn books is another method 
of preservation, carried out by the bindery section of 
the UCC Library. 

Photocopying has been resorted to sometimes to 
preserve valuable materials whose lives have reached 
their end. Also, from time to time the entire library 
is fumigated by the Sanitary Section of the UCC but 
this is not professionally done. Fumigation requires 
the use of chemical pesticides. Due to the toxicity of 
these chemicals, books requiring fumigation are iso- 
lated and treated in a sealed chamber or other enclosed 
space. What happens in the library is that fumigation 
is done by spraying all books while they are still 
upright on the shelves. The ensuing toxic fumes are 
a hazard to library staff and users for days. It is 
important also for the librarian to know the pesticide 
being used but unfortunately it is all left to the 
discretion of the Sanitary Section. Whatever chemi- 
cal is available is used. 

Attempts are being made at microfilming such 
documents as newspapers and periodicals for pres- 
ervation purposes. This has not yet taken off. 


Difficulties faced by the library 

The library is no doubt faced with genuine difficul- 
ties which cannot be overlooked. There is the age-long 
problem of lack of funds which hampers significant 
progress. Over the year budgets have been reduced 
drastically. This has resulted in preservation being 
pushed to the background. There is also the twin 
problem of escalating prices of books. This also 
makes it almost impossible for proper attention to be 
given to preservation, since priority is placed on 
augmenting the library stock. For the past number of 
years, due to foreign exchange constraints, acquisi- 
tions have been made for the library by the Ministry 
of Education with money provided as loan by the 
World Bank. It is worth noting that no provision has 
been made for the preservation of these materials. 
This is a clear indication of the low priority given to 
preservation. 

Inadequate space is another problem that the 
UCC library has to contend with. Lack of space 
results in books being tightly packed on shelves. In 
some instances books are placed on top of each 
other or left unshelved, in which case they are left 
on trollies. New books which have already gone 
through the cataloguing and classification process 
have yet to find places on the shelves. Due to lack of 
space they may have been packed on tables or on 
top of shelves. When books are handled and kept in 
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all sectors of library practice, from the highest levels 
... of policy making to the most specific areas of 
clerical activity in the long term.'? The writer agrees 
with this observation with specific reference to the 
UCC Library. She wishes to add that significant 
progress would be made towards a more sustained 
and effective preservation practice ifthe UCC Library 
would adopt for itself the preservation mission of 
archivists. The British archivist, Sir Hilary Jenkinson, 

puts it succinctly: 

"The duties of the Archivist ... are primary and 
secondary. In the first place he has to take all possible 
precautions for the safeguarding of his Archives and 
for their custody ... subject to the discharge of these 
duties, he has in the second place to provide to the 
best of his ability for the needs of historians and 
research workers. But the position of primary and 
secondary must not be reversed, 

The key to an effective preservation programme 
in any library requires such a mindset. 
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on the shelves. In addition to this, it is suggested that 
library assistants should look out for signs of deterio- 
ration, wear and tear, insect infestation, dampness, 
mould, yellowing of pages and all the other telltale 
signs of deterioration. 

For there to be any significant progress with 
preservation at the UCC Library, it is suggested that 
one person be made directly responsible, initially. It 
is recommended that the Librarian delegates this role 
to an Assistant Librarian. His or her role would 
include drawing up a programme for preservation 
and followmg through with its implementation. He 
or she would also be called on to liaise with the 
Sanitary Section of the UCC for periodic and regular 
fumigation of the library. He or she would act as the 
overall coordinator for preservation, in the UCC 
Library. He or she would become the nucleus for the 
establishment of a preservation department. 

As a beginning towards disaster preparedness, it 
is recommended that fire-extinguishers and sand- 
bags be purchased against fire outbreak, and staff 
trained to use them. 

Whatever efforts are already being made to- 
wards effective preservation of library stock should 
be intensified. 

Library staff and management must however 
understand that it is impractical and unnecessary to 
preserve everything forever as libraries do not have 
the space, facilities or money to do this. In view of 
this, the management and staff of the UCC Library 
must learn to distinguish the obsolete and less valu- 
able materials from those of more research value, 
and so discard them. This would create space in the 
library. Weeding out of library materials should be 
pursued boldly and used as a means of preservation. 
It is the researcher's belief that if these rudimentary 
and inexpensive suggestions are implemented, pres- 
ervation will become a part of UCC Library practice. 
This will then open the way for the more technical 
and expensive but necessary elements of preserva- 
tion practice, such as the training of personnel and 
the establishment of a preservation outfit. 


Conclusion 

Harvey observes that the preservation problem 'is of 
such a magnitude and the prognosis so poor, that 
efforts must be redoubled to find practical, effective 
and realistic means of integrating preservation into 
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Abstract 


Human services are increasingly regarded as a discrete field of study and an area of public concern. However, 
little work has been done outside the USA in determining the make-up of knowledge in the field. This paper 
presents an initial attempt to study what constitutes the knowledge base for the child care profession in the UK. 
` To do this the citations of five British child care journals were analysed for 1993. Reference type, country code 
and self-citations of journals and authors were recorded and analysed. Comparisons were drawn with an earlier 
citation study of social work. The study confirmed a 1:1 relationship between books and journals and showed the 
diffuseness of the sources upon which child care professionals draw. Although books were the most cited category 
individual books were rarely cited multiply. A core of journals was identified but child care journals were only 
fifth in frequency of citations after books, other journals, grey literature and other child-orientated journals. This 
suggests that the child care profession in the UK is outward-looking, although almost wholly dependent on the 
English language. Pointers for further research are suggested. 


ers’. They suggest that the diffusion of social work 
knowledge amongst practitioners is much more indi- 
rect, subtle and harder to measure than the effect of a 
journal article. They further suggest that it is the 
ideas that constantly appear and reappear in the 
literature that eventually become absorbed by practi- 
tioners into the conventional wisdom. This somewhat 
conflicts with other findings of citation analysis (Baird 
and Oppenheim 1994) which suggest that the classic, 
seminal papers tend not to be cited much as they are 
felt to be self-evident. 

The human services area, however, has a diffuse- 
ness that perhaps is not found in more scientific 
disciplines. Considerations other than purely scien- 
tific ones apply more widely. Both practice and 
policy are more subject to political, social, eco- 
nomic, cultural and legal changes. This will be seen 
in the literature analysed in this paper. 

ii) The journals chosen had to be published 
in Britain and to be mainly for a British 
audience. 

This is not to downplay the significance of other 
important child care journals such as Child Welfare 


.and Children and Youth Services Review. However, 
. it is a realistic assessment of the usage of journal 


literature within Barnardo's, a large British volun- 
tary agency providing child care services. The lack 
of impact of international literature in the human 
services has been noted by several researchers (Line 
et al. 1971, Slater 1989). Mendelsohn (1984) sug- 
gests that one reason for this is the relative lack of 
coverage of social work journals within online bib- 
liographic databases. Although ASSIA (Applied 
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Introduction 

Citation analysis plays a major part in understanding 
how knowledge is disseminated throughout profes- 
sions. Much of the research done within this field has 
concentrated on the whole area of social work and 
has been predominantly American. 

The aim of this initial study is to look at how the 
literature on policy and practice in professional child 
care/welfare in Britain is made up. For this purpose 
several journals were selected to form a citation pool 
and the citations made in those journals during 1993 
were analysed. 


Methodology: selection of sample 

Even the selection of such a sample may have to be 
subjective. Baker (1992) admits that his sample was 
selected for convenience. In his case the journals 
were included in the Social Sciences Citation Index 
which provided good inter-relationship data. The 
lack of such treatment for British-based journals 
precluded such an approach for this paper. 

The journals selected for this paper were chosen 
from two main considerations: 

i) They had to be aimed at practitioners 
and policymakers within the child care 
profession. 

The research-practice divide has been clearly 
identified as a significant problem within the social 
sciences since Maurice Line's original INFROSS 
study (Line er al 1971) and continues in more recent 
studies (Slater 1989). Lindsey and Kirk (1992) con- 
clude that periodical literature ‘seems to make no 
direct, easily discernible impression on practition- 
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Certainly the analysis showed that it occupies a 
different world from the rest of the sample. 


Methodology: categorization 
Citation references were entered into Pipedream, a 
spreadsheet program, and assigned a reference 
type and a country code. Self-citations of both 
authors and journals were also recorded. Codes 
were used for books, journals and publishers for 
later analysis. 
The reference type recorded the type of literature 
thus: 

a) child-orientated social welfare journal 

h) other child-orientated journal 

b) social welfare journal 

c) other journal 

d) book/monograph 

e) grey literature 

f) statute 

g) bibliography 

i) audio-visual material 


Following data input, frequency counts were made 
and comparisons were made against Berman's and 
Eaglstein's earlier citation analysis of social work 
(1994). This study was chosen for comparison due 
to its closeness of subject and chronology. 


Discussion of results 

The journal sample contained 3,217 citations. Of 
these 46% (1,475) were to journals and 54% (1,735) 
to books. The total for books includes grey litera- 
ture, statutes and bibliographies. The remaining 
seven references were to audio-visual materials. 
The results show a clear 1:1 relationship between 
books and journals within professional child care. 
This is quite different from citation analyses in the 
sciences, which have shown a much greater prefer- 
ence for, and.reliance on, journal literature. A 
significant proportion of the citations to books in 
the current study, however, were to specific papers 
or chapters within the main text. It is likely that 
child care professionals use books rather like jour- 
nals, only extracting small parts for application. 


Table 1: Citations by reference type/category 


Reference type 


Books/Bibliographies 
Grey literature 


1159 36.02 
543 16.87 


Statutes 33 1.02 


7 0.21 


Child care journals 216 6.71 
Other child-orientated journals 6 9.20 
Social work journals 78 2.42 
Other journals 


Audio-visual materials 


885 27.51 





Social Sciences Index and Abstracts) has been de- 
veloped since then, the human services area still 
lags far behind other disciplines internationally. 
The journals used for this study were: 


Adoption and Fostering 

This has been produced by the British Agencies for 
Adoption and Fostering (BAAF) since the early 
1950s (formerly as Child Adoption). Its papers are 
written by a mixture of academics, practitioners and 
service consumers and it is widely circulated to 
adoption agencies throughout the UK. 


Child Abuse Review 

This is the professional journal of the British Asso- 
ciation for the Prevention of Child Abuse and 
Neglect (BASPCAN). Originally established in the 
mid 1980s it has, since its 1992 relaunch, built up a 
good reputation as a multidisciplinary journal with 
papers from health and social care practitioners, 
psychologists, policy-makers and related profes- 
sionals such as forensic scientists. 


Child: Care Health and Development 

This multidisciplinary journal published by 
Blackwell first appeared in 1975. Its aim is to 
further co-operation between the various profes- 
sions concerned with the care of children. In this its 
primary focus has been to publish articles reporting 
original scientific research on child health and well- 
being with a particular focus on children withspecial 
needs. l 


Children and Society 
This is produced by the National Children’s Bureau 
and first appeared in 1988. It seeks tó relate the 
social context to children and the services provided 
to them. The Bureau is the primary research body 
for children and services to them in the United 
Kingdom and was established in 1963. 


Highlights 

In 1973, following a British Library-sponsored re- 
search project, the National Children’s Bureau began 
a series of Highlights, which are two-page summa- 
ries of research and current knowledge written for 
practitioners by experts in the field. Now commis- 
sioned and edited by Nicola Hilliard, Librarian at 
the Bureau, these continue to be both widely dis- 
seminated to practitioners and used as a resource by 
students of various disciplines. They have been 
included within the survey as they are such an 
important resource within the profession. 


Journal of Adolescence 

Although the preceding journals include coverage 
of adolescence, it was felt necessary to include a 
perspective treating adolescence separately. It forms 
part of child welfare but is a distinct and discrete 
area in its own right, This journal is rather different 
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(Table 4 continued) 
Adoption & Fostering . 8l 
Journal of Child Psychology & 
Psychiatry 29 
Journal of Youth & Adolescence 
British Journal of Psychiatry 27 
International Journal of 
Eating Disorders 26 
Journal of Consulting & 
Clinical Psychology 26 
Journal of Personality & 
Social Psychology 26 
Journal of Adolescence 25 


All England Law Reports 21 
Psychological Bulletin 21 
American Journal of Psychiatry 19 
Developmental Psychology 18 


Pediatrics 18 
British Journal of Social Work 17 
Family Law Reports 17 
Lancet 17 
Psychological Medicine 17 
Child: Care Health Development 16 
American Journal of Orthopsychiatry 12 
American Psychologist 12 
Community Care 12 
Journal of the American Academy of 

Child & Adolescent Psychology 12 
Journal of Pediatrics 11 
Acta Psychiatrica Scandinavica 10 
Adolescence 10 
Journal of Early Adolescence 10 





So some 4596 of the total citations to journals in 
the sample are accounted for by 31 journals. The 
top 15 journals with 20 or more citations account 
for 29.29% of the citations to journals. Only 8 of 
these are child-related and presumably form part of 
the core for professional child care. These areAdop- 
tion and Fostering, Árchives of Diseases in 
Childhood, Child Abuse & Neglect, Child Develop- 
ment, Developmental Medicine & Child Neurology, 
Journal of Adolescence, Journal of Child Psychol- 
ogy and Psychiatry, and the Journal of Youth & 
Ádolescence. This core shows the importance of 
medical and psychological aspects within the lit- 
erature of professional child care. The only other 
subjects to make the list of more heavily cited 
journals were law and social work. 

Of course the newness of some of the titles cho- 
sen for the sample — Child Abuse Review and Children 
& Society — puts them at a disadvantage when con- 
sidering citation counts as a predictor of value. It is 
likely that in a few years' time these two journals will 
receive more citations as their reputation increases 
and they reach a wider audience. 
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books, followed by 27.5196 from non-child-specific 
and non-social work journals. Grey literature ac- 
counted for 16.8796 of all citations. These figures are 
broadly similar to those found by Berman and 
Eaglstein (1994). Social work journals made signifi- 
cantly less impact in this study than in Berman and 
Eaglstein (1994). See Table 6. 


Table 2: Total citations in the sample 


Adoption & Fostering 347 
Child Abuse Review 482 
Child: Care Health Development 61 
Children & Society 748 


Highlights 176 
Journal of Adolescence 1003 





The 1:1 relationship shows the importance of 
books to the child care profession. Even so there 
were only six references to books which attracted 
more than four citations each and which were cited in 
different papers. These were: 


Table 3: Most cited books 


Great Britain. Children Act 1989. London: HMSO. 
(17 citations) 


Browne, K., Davies, C., and Stratton, P. eds. 
(1988) Early prediction and prevention of child 
abuse. Chichester: Wiley. (9 citations) 


Brooks-Gunn, J. and Peterson A.C. eds. (1983) 
Girls at puberty: biological and psychosocial 
perspectives. New York: Plenum. (6 citations) 


Finkelhor, D. (1986)A sourcebookon child sexual 
abuse. Newbury Park, CA: Sage. (5 citations) 


Lazarus, R.S. and Folkman, S. (1984) Stress, 
appraisal and coping. New York: Springer-Verlag. 
(5 citations) 


Rutter, M., Tizard, J., and Whitmore, K. (1970) 
Education, health and behaviour. Longman. (5 
citations) 





The very low number of multiple citations sug- 
gests a field without a common core of knowledge or 
with a very diffuse body of knowledge upon which it 
draws. This is confirmed by an analysis of the jour- 
nals cited. 31 journals cited gain 10 or more citations. 
These are: 


Table 4: Most cited journals 


No %of 
journals 


Archives of Diseases in Childhood 38 
Child Abuse & Neglect 34 


Child Development 34 
Developmental Medicine & 

Child Neurology 34 
British Medical Journal 33 
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pecially view journals as the route to professional 
recognition, and secondly they are a way of ensuring 
that trainee social workers identify with scientific 
methods. The same point applies to the current study, 
as child care and child-oriented journals together 
only account for 15.9196 of total citations. The other 
main reason why journals are used for citation study 
is that they are easier to identify than books. 

There were some echoes of the work of Berman 
and Eaglstein (1994) but also some significant 
differences. 

There was much less self-citation of journals in the 
present study. Berman and Eaglstein (1994), in a 
reanalysis of Baker’s (1992) data, reported that 47.9% 
were self-citations. This suggests a very inward-look- 
ing profession. This is certainly not the case with this 
British study. The five sample journals cited them- 
selves only 4.61% (68 times). Author self -citations 
were higher but still only 8.98% of the total. 

The diffuseness of the profession described in 
Cheung's (1990) study is confirmed by this British 
analysis. Professional child care knowledge seems to 
be similar to its social work relative in drawing on a 
range of disciplines including psychology, medical 
Sciences, psychiatry, law and education. What 1s 
perhaps more worrying is that the child care profes- 


Publishers were also studied as part of the analy- 
sis. Only 12 were cited on more than 20 occasions: 


Table 5: Citing of publishers 


% of books, 
grey literature 
and statutes 


HMSO 

Routledge 

J Wiley 

National Children's Bureau 
Blackwell 

Sage 

L Erlbaum 

Longman 

Macmillan 

Academic Press 

Oxford University Press 
Penguin 

Springer Verlag 





Berman and Eaglstein (1994) question why, when 
less than 2096 of citations stem from social work 
journals, these journals are used for citation study. 
They suggest two main reasons: that academics es- 


Table 6: Type of citations by country 


No. % 
1159 36.02 
(1762) (34.4) 
543 16.87 
(989) (19.3) 
216 6.71 
78 2.42 
294 9.13) 
(863) (16.8) 
296 9.20 
885 27.51 
1181 36.71) 
(1515) (29.5) 
2.21 40 124 
1672 3217 100.00 


(1211) (100.00) | (5129) (100.00) 


e of citation US 
No. 4 


(100.00) 





BOOKS ۱ 436 
(Berman/Eaglstein) (1279) 


GREY LITERATURE 
(Berman/Eaglstein) 


CHILD CARE JOURNALS* 
SOCIAL WORK JOURNALS* 


(*sub-total 
*(Berman/Eaglstein) 


OTHER CHILD-ORIENTATED JOURNALS# 


OTHER JOURNALS# 407 
(#sub-total 522 


f (Berman/Eaglstein) (1176) 


OTHER (statutes, audio-visual) o 37 


TOTAL 1116 
(Berman/Eaglstein) (3918) 


NOTE: 
The totals column also includes material from countries other than the United States and the United Kingdom. 
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English language. Although this British study has 
shown a wider willingness to draw on material from 
different countries than that of American studies, it 
still displays an unwillingness to use foreign 
languages. Foreign material also showed the 1:1 
relationship between books and journals found in 
the overall study. 


Conclusion 

This study has sketched a partial picture of the 
knowledge base within professional child care in 
the United Kingdom. It is an area that would benefit 
from further research. There would also be benefit 
in studying more closely the links between child 
care and other disciplines since the field is obvi- 
ously diffuse. Books and grey literature have great 
importance within child care and, indeed, in social 
work more generally, and their contribution could 
usefully be evaluated. Building in the views of child 
care professionals would aid this exercise. 
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sion does not seem to draw heavily on its own 
research base as published in child care or social 
work journals. 

In addition to drawing on a wide variety of disci- 
plines British child care also draws on material from 
a wide range of other countries. While 51.9796 of the 
references from the five citation pool journals were 
from the UK, 34.69% were from the USA. 12.09% 
were from other countries, with about half of these 
being international journals. 

Nevertheless the vast majority of the material, 
even from foreign countries, was written in the 


Table 7: Non-US and non-UK sources 


` International 
Australia 
Netherlands 
Germany 
Sweden 
Canada 
France 
Norway 
China 
Scandinavia 
Finland 
Israel 
Denmark 
Ireland 
Other countries 





Table 8: Non-US and non-UK sources by category 


Category No % 
Child care journals 31 7.96 
Other child-orientated journals 4 8.74 






















Social work journals 7 1.79 
Other journals 131 33.67 
Books 87 22.86 
Grey literature 96 24.67 
Statutes 3 0.77 
TOTAL book material 186 47.81 


Note: Figures do not match 10096 due to rounding. 
40 items (1.2496) were not identified by country. 
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Electronic paper — can it be real? 


Roger Gimson 
Hewlett Packard Laboratories, Bristol 


Abstract 


Some people have used the phrase ‘electronic paper’ to suggest that electronic information displays may.replace 
the printed page. Progress towards the ideal of electronic paper is reviewed along several dimensions: the 
technologies, such as the display surface, the appearance, such as the page layout, and the function, such as the 
styles of interaction, that are currently available and may become possible in the future. 


dium. The widespread use of the fax machine shows 
that paper still holds its attractions at each end of a 
communication, despite a digital representation be- 
ing used for intermediate data transfer. Indeed, so far 
computers appear to have led to an increase in the use 
of paper at work (Frohlich 1994), rather than the 
widely predicted paperless office. 

To explore this further, it is necessary to be 
clearer about the attributes that are being compared 
between the two media. 


Electronic paper and electronic books 

For the sake of the subsequent discussion, let us 
distinguish between electronic paper and the more 
nebulous concept of the electronic book. The term 
‘electronic paper’ will be taken to mean a relatively 
close electronic approximation to the marked page. 
The term has been used on a few previous occasions, 
most notably for a project at the National Physical 
Laboratory, which explored the use of a flat-panel 
LCD (liquid crystal display) with electronic pen 
input to capture handwritten notes (Thomas 1987, 
Brocklehurst 1991), and in Freestyle, a user interface 
developed by Wang Laboratories for capturing and 
annotating printed-pages in the electronic medium 
(Millikin 1988). 

On the other hand, the term ‘electronic book’. has 
become a phrase widely used to describe almost any 
information published in the electronic medium, 
whether it has any similarities to the paper medium 
or not. Here, we will confine our interest to a direct 
comparison with paper, and especially with the printed 
page. | 

Inthe following sections, some specific attributes 
of the printed page will be compared to the capabili- 
ties of electronic technology. The electronic medium 
will be found, currently, to fall short in several 
respects. Of special interest is whether these short- 
comings are ones which can be minimised or removed 
altogether in the future. 


The ubiquity of paper 
۸ piece of paper is a ubiquitous carrier of informa- 
tion. It can be marked, with ink or other surface 
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Introduction ۱ 

As increasing amounts of information are held and 
presented in the electronic medium, it is an appropri- 
ate time to reflect on whether ‘electronic paper’ is a 
realistic proposition now or in the future. This article 
looks at how far we have come towards the goal of 
replacing the printed page. 

In many respects, electronic storage media such 
as CD-ROM, or network-accessible disks, offer sig- 
nificant advantages of compactness and accessibility. 
However, there are many attributes of paper that are 
hard to emulate using current electronic technology 
— both statically, in terms of print quality and use of 
space, and dynamically, in terms of sense of location 
within the information and the ease with which it can 
be browsed. 

` In this article, it is argued that many valuable 
attributes of paper have so far been largely ignored in 
the translation to the electronic medium. To what 
extent can we hope to capture these attributes too? 


The benefits of the electronic medium 

We are all familiar with the potential benefits of 
electronic information. Digital representations are 
more amenable to cost-effective processing, com- 
munication and storage. 

For example, once text is captured in a digital 
representation it becomes possible to offer electronic 
processing functions that are otherwise impractical. 
It would be unthinkable to use full-text search for 
information retrieval if restricted to the printed me- 
dium. Electronic documents can be distributed in 


seconds over networks. Digital representations allow | 


many more storage options, and may be especially 
attractive for archives since they may be copied be- 
tween different storage media without loss of fidelity. 

Beyond this, there are further advantages due to 
the representational flexibility of the electronic me- 
dium — for example, it can be used to convey dynamic 
information, such as animation, sound and video 
sequences. 

However, given that there are many new possi- 
bilities opened up by digital representation, it does 
not necessarily negate the value of the printed me- 


to that of paper. The overriding factor in the quality 
of an electronic presentation system is the display 
technology. The nature cf the display affects the 
readability and spatial layout of the information. 
With the batteries required to make it portable, the 


- display also has a large influence on the weight, 


volume and cost of the resultant system. Table 2 
shows the size and resolution of the LCD display 
used in current notebook computers compared to that 
of information printed in various ways on a piece of 
paper. ; 

It can be seen that 10.4" diagonal LCD displays, 
as used in current notebook zomputers, are still smaller 
in area and of much lower resolution than printed fax 
pages, let alone printed magazine pages. They at- 
tempt to make up for this with more colours per pixel 


` (picture element), but this does not improve the 


quality of monochrome text. 

The overall storage required for an uncompressed 
page image, shown in the last row of Table 2, gives a 
rough indication ofthe detail that can be shown using 
each presentation medium. The greater the storage, 
the finer the quality that can be presented. 

Of course, future advances in display technology 
will allow gradual improvement in these size and 
quality factors. The last column in Table 2 shows, for 
comparison, the attributes of a research prototype 
display. The P20 display developed at Xerox Palo 
Alto Research Center (Martin 1993) is an active 
matrix LCD screen with 3072 x 2048 monochrome 
pixels at a resolution (285 pixels per inch) approach- 
ing the 300 dots per inch of an earlier generation of 
monochrome laser printers (which now are typically 
offering 600 dpi). However, this prototype would not 
only in practice be far too costly to manufacture for 
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effects, via press, laser printer or pen. Once marked, 
it is difficult to unmark. It is light, cheap and 
durable enough for most daily tasks. It can be 
annotated, bound and filed away for later reference. 
Its weight, colour, texture and general condition 
give an indication of its provenance. Its size and 
shape have evolved into a number of form factors to 
suit the different needs for portability, readability 
and ease of handling. 

How well do the capabilities of electronic tech- 
nology match these characteristics? To date, the 
nearest electronic equivalent to a piece of paper, in 
terms of portability and richness of viewable infor- 
mation, is the notebook computer with a liquid crystal 
display. Desktop monitors attached to computers, 
though offering higher display quality, are not port- 
able. However, the portable computer is perhaps 
better compared to a folio-case or box-file full of 
documents, as can be seen in Table 1. 

The notebook computer 1s comparable in weight 
to a folio-case containing several documents, but has 
less than half the volume. If it is assumed that the full 
cost of the folio-case and the computer is included, 
the two are comparable in terms of overall cost per 
stored page. The sub-notebook computer has a size 
and weight advantage over the notebook computer, 
but at a greater overall cost per stored page. Either 
portable computer has considerably greater total ca- 
pacity for documents, though the exact capacity 
depends on the storage medium, the document repre- 
sentation and the complexity ofthe document content. 


Electronic display technology 
The quality of the displayed information on a note- 
book computer is, however, nowhere near comparable 


Table .ا‎ Comparison of weight, volume and cost of paper containers and portable computers 


A4 paper documents documents computer computer 
Pagesdes | 2 | 100 | 140 | 4500 | 2000 _ 
۷۵۵۷9 | 5 | so | 4x | 820 | 19 - 
Voume (cc) — | 6 | 8200 | 600 | so | 2100 
70۵/5098 | oos | 40 | 75 | 29 | 2000 | 
[Costperside(p) | 82 | 4 | 05| 55 ؟؛ ا‎ — 


Table 2. Comparison of sizes and resolutions of printed paper versus LCD displays 


A4 offset | A4 laser A4 fax 10.4" LCD | Xerox P20 

printed printed 800 x 600 |. AM-LCD 
colour colour monochrome colour monochrome 
| 92 | 738 | 


ney) | 9 | ® | © —-‏ ئ دہ ردامدا0 
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Resolution (pixels per inch) 2400 


Total pixels (Mpixels) 
Depth of colour (bits per pixel) . 
Storage (MByte uncompressed 
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Despite the rate of progress of electronic technol- 
ogy, and though graphics designers are rapidly having 
to come to terms with new interactive paradigms, it 
would be foolish to imagine we can lightly discard 
the familiar conventions, developed over centuries, 
for paper-based presentation of information. 


Distribution 

The paper medium has been the main form of infor- 
mation distribution for centuries, but distributing 
information on paper involves significant time and 
cost. One of the prime advantages of the electronic 
medium is that, given a communication infrastruc- 
ture, the time and cost to distribute information can 
be minimal. 

With paper, the sender has reasonable confidence 
that the form in which the information arrives is the 
same as the form in which it was sent. Unfortunately, 
this is not yet the case with information distributed 
and presented electronically. There has been slow 
progress towards standards for information exchange 
that can convey sufficiently rich and accurate con- 
tent to satisfy senders. This is especially the case for 
publishers, whose reputation can be linked to the 
visual quality of the received information. 

The representation of electronic information has 
for many years been either as unstructured image, for 
which the Group 3 fax is by far the most widely used 
standard, or for structured formats that are tied to the 
application which generated the information. 

In the last two years, new structured representa- 
tions have been proposed for distribution of electronic 
information, with development ofthe associated soft- 
ware for presenting the information on standard 


` platforms, often being offered at no cost. Examples 


of this approach include Adobe's Acrobat, Farallon 
Computing's Replica, No Hands Software's Com- 
mon Ground and Novell's Envoy (Seybold 1994). 
Structured representations allow processing, such as 
searching, to be performed by the reader, as well as 
often reducing the storage required for a document. 

All of the systems mentioned above aim to pre- 
serve the appearance of the information as it would 
be seen on the printed page. However, since current 
screens cannot display a whole page at a time while 
maintaining acceptable readability, much of the im- 
mediacy of the full page layout is lost. Often the 
screen is used simply to confirm the identity of the 
document, so that it can then be printed and read 
from the paper medium. 

For less formal communication, paper can be 
used in many formats to support social interaction in 
the workplace (Frohlich 1994). Memos, stick-on 
notes and printed forms all have their place alongside 
documents and reports, with ease of annotation be- 
ing an important attribute. Until recently, the 
electronic medium has only been able to offer single- 
font textual e-mail (electronic mail) to support 
inter-personal communication. Richer electronic for- 
mats are being developed, but their rate of penetration 
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many years, but its high power consumption would 
also not allow it to be battery operated for the 
foreseeable future. 

As far as the portable, battery powered display is 
concerned, research into new kinds of bistable pas- 
sive LCD currently underway (Surguy 1993, Mosley 
1994) may lead to larger, higher quality displays 
that consume power only when being updated. In 
addition, development of new flexible substrates 
may offer lighter, more resilient display panels. 


Readability 

It has been seen that display technology may take 
some time to reach the size and quality of the printed 
page, especially in fineness of detail. However, 
there is reasonable evidence that electronic displays 
can compete already in terms of readability of larger 
text fonts. Comparisons of the readability of infor- 
mation displayed on CRTs (cathode ray tubes) have 
been carried out since the days of character-based 
displays. 

With the advent of raster graphics and higher 
resolutions, reading from CRT screens has improved 
to the point where, on some measures, reading speed 
and comprehension are found to be comparable to 
that of the paper medium (Muter 1991). However, in 
that experiment, skimming, in order to grasp a gen- 
eral sense of the content, was still found to be some 
40% slower from a CRT. The latter may depend 
more on the spatial context of the text presentation 
than on the readability of individual words. 

Very few studies have been carried out to date on 
the readability of LCD screens, or of using known 
techniques, such as anti-aliased text, to improve the 
fidelity to the printed form. In general, there are so 
many factors to be taken into account when assess- 
ing readability that the experimental evidence is 
rarely consistent (see Dillon 1992 for a review of the 
literature in this area). 

With displays of text, readable screen fonts can 
be created provided there is sufficient screen resolu- 
tion. Typically, finer details of the typeface that 
would be preserved on paper are lost on the screen, 
but without significant loss of readability. For hand- 
written text and drawing, however, and for finer 


graphical detail, there is still a long way to go before ۰ 


display technology can come even close to the qual- 
ity of paper and pen. Even for screen text fonts, LCD 
displays do not currently have enough pixels to 
display a complete, readable A4 page. 

The spatial layout of information on the page is 
an aid to recall (Lovelace 1983) and can be of help 
when browsing and finding one's way around a 
longer document. In systems in which information 
is reformatted to fit the arbitrary size of a display 
window, much of this spatial context can be lost. 
Research is in progress to see if the essence of 
graphic layout can be retained even when the pa- 
rameters of the viewing space are different from 
those of the printed page (Weitzman 1994). 
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document concerning its vse could add to the rich- 
ness of the electronic medium (Hill 1992), provided 
it does not impose an additional burden on the user in 
capturing or accessing the data. 


The aesthetics of paper 

In the rush to provide additional functionality in the 
electronic medium, it is all too easy to forget some of 
the aesthetic factors that may be the ultimate barrier 
to our full acceptance of it. 

The feel, texture, quality of binding and the 
experience of handling a familiar book are too 
subtle to have yet been captured. The personalised 
nature of the pencilled notes in the margin, or the 
turned-down corner of a page are absent. Above all, 
the simplicity and directness of perception and physi- 
cal manipulation of the page remain bevond our 
ability to emulate. 


Conclusions 

This article has aimed to show that in terms of not 
only information content, but also many other tan- 
gible and intangible aspects, the electronic 
presentation of information 1s not yet comparable 
to that of paper. 

Electronics offers compact, lightweight means of 
storage. In specific areas, such as ease of lookup for 
reference works, or support for animation to illus- 
trate educational texts, the electronic medium could 
offer a particular advantage. This is especially the 
case if interaction increases engagement with the 
presented material. In general, however, the paper 
medium is not in any danger of being displaced in the 
foreseeable future. Its quality and universality will 
be hard to beat, especially with the need for power- 
consuming devices to mediate access. 

An analysis of the places where the electronic 
medium falls short of the attributes of paper can be 
used to point to opportun-ties for further develop- 
ments. It is not simply a cuestion of emulating the 
paper medium, but of identifying the places where 
there is a real disparity and using them to motivate 
new advances in the electronic presentation of infor- 
mation. 

In this way, it is hoped that in the future it will not 
be necessary to give up some important attributes of 
paper in order to reap the additional benefits of the 
electronic medium. 
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will depend on the rate of adoption of computer 
equipment, such as integrated pen-input devices, 
that can support the extra functionality. 


Manipulation 

The comparison so far has been of the static presen- 
tation of electronic information to that on the printed 
page. It is now necessary to look at the extent to 
which the electronic medium can match the versatil- 
ity of paper in its dynamic use. 

Many people in our culture learn how to manipu- 
late paper documents from an early age. Turning 
pages, and especially the use of the thumb to control 
the rate of turning, is possible before the age of two. 
At the document level, the use of spatial location of 
a document on a surface, for example to separate 
those documents already browsed from those still to 
be considered, is also a concept learnt at a very early 
age. Perhaps it is simply because many of these 
behaviours have become automatic, and so invisible 
to us, that designers of electronic information sys- 
tems have seldom provided analogous manipulations. 

In recent years, interest has grown in addressing 
some of the less formal manipulations which are so 
natural for paper. For example, though folders have 
been used in the past in several contexts, no elec- 
tronic filing system has yet successfully captured the 
notion of manipulating a pile of documents, even 
though this is such a common organising principle in 
physical offices. Electronic interface concepts based 
on this idea have now started to be explored (Mander 
1992). 

The keyboard and mouse may be good input 
devices for data entry and selection. However, they 
leave much to be desired as a mechanism for page- 
turning and riffling. A study carried out at Hewlett 
Packard Laboratories in Bristol has found positive 
reactions from users to a prototype hand-held elec- 
tronic information browser. This allowed two-handed 
page-turning using thumb-operated pressure pads — 
more directly comparable to the way a book is held 
and manipulated than existing keyboard or mouse- 
driven interfaces (Hawkins 1993). 

The work of others has, for example, been aimed 
at presentation techniques that are relevant for orien- 
tation and navigation in the electronic medium in a 
way that can be perceived by the user without addi- 
tional cognitive load (Nygren 1992). As the latter 
paper states: ‘one of the characteristics of a good 
interface 1s that it appears obvious to the user’. 

Another phenomenon that is usually ignored is 
the impact that continued use has on the paper me- 
dium itself — general wear and tear. It happens without 
deliberate intent, but can convey useful information 
completely separate from the document contents, 
such as indicating regularly accessed pages. Here the 
conventional wisdom might be that the electronic 
medium has an advantage because it does not wear 
out — but this is at the cost of some potentially useful 
data. Retaining historical data within an electronic 
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Charging and paying for information on open networks 


While this paper focuses on payment mecha- 
nisms, it is useful in the context ofthe above scenario 
to clarify the distinction between payment and the 
issues of pricing, charging, accounting and billing. 
The published price for an item of information (or 
service) may bear little resemblance to what a par- 
ticular user is charged for it during a specific 
transaction; furthermore, when a bill is issued to 
whom, and who pays it, when and how, are all open 
questions. The following is offered as a way of 
clarifying the issues: 

e pricing provides a basis for charging 

e charging is the process of applying a relevant 
pricing policy to a specific transaction context to 
determine the amount chargeable 

e billing is the process of requesting payment 

e payment is the transfer of monetary value from 
buyer to merchant. 

The rest of this paper is concerned with the 
technologies required for paying electronically over 
the Internet. Electronic transfer of money between 
bank accounts is commonplace, but this takes place 
over secure dedicated networks. The electronic pay- 
ment systems that are needed will have the same end 
result, but will involve interactions between buyers, 
merchants and financial institutions over open, in- 
trinsically insecure networks. The systems also need 
to be integrated seamlessly with the networked tools, 
applications and protocols that support the commer- 
cial activity over the Internet. Currently, it would 
appear that the World Wide Web is the platform of 
choice for this activity. 


The technology options 

A number of schemes and systems have emerged 

over the last year or so, addressing the needs of open 

networked payment. In trying to compare and con- 
trast the various schemes, the following attributes 
were found to be useful: 

1. access to trusted third party — this may be online, 
where some secure server belonging to the TTP is 
accessed during a transaction, or offline, where 
the TTP service is used independently ofthe trans- 
actions (e.g. for key certification, and for funds 
transfer requests). 

Ás in any monetary system, there is always a 
requirement for a trusted third party. Systems 
based on public key cryptography require a system _ 
of TTPs to issue keys or to certify their ownership. 
Systems that rely on external tamper-proof devices 
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Introduction 

With the current commercialization of the Internet, 
rapid development of mechanisms and systems to 
support commercial transactions over open, public 
networks can be seen. This paper sets out to review 
the emerging technologies with a specific applica- 
tion scenario in mind: that of online access to 
commercially-operated bibliographic information and 
full-text document delivery services via the World- 
Wide-Web. 


Some basic requirements 
Although much of the popularity of the Internet 
may be ascribed to the ready availability of ‘free’ 
information, it is becoming clear that information 
that has to be relied upon has to be paid for. The first 
question that arises in the scenario where a user is 
able to search a bibliographic database via the 
Internet and to access directly the full text of the 
articles found is how should the information be 
priced. The basic choice is between a subscription 
model and a transaction model, although these are 
not mutually exclusive. The former implies a need 
to authenticate the identity of the user as a sub- 
scriber with less need to keep track of the use made 
of the information; charging is usually completely 
independent and separate from the access to the 
information. Transaction-based pricing, however, 
implies a direct link between usage and charging, 
with a need for a well-defined pricing model to 
enable charges to be calculated from the actual 
elements of usage. These usage elements can vary 
in granularity and nature, e.g. per-byte, per-minute, 
per-page, per-hit. 

The user who has to pay for the use of a biblio- 

graphic database and on-line document delivery 
service will have a number of concerns: 

e atomicity — what he pays for should be what he 
gets, regardless of system or network problems 
that might arise during a session 

e security — he needs to be safe from any fraudulent 
behaviour that might result in financial loss 

e anonymity — he may wish to keep his information 
purchases private 

e charging — he will want to be able to check that 
what he gets is what he wants before being com- 
mitted to paying for it 

e payment — he may wish to choose how to pay. 

e usability — he will want to use standard easy-to- 
use tools for access to the information. 





thentication, integrity, confidentiality and non- 
repudiability are met. However it is probably not 
cost-effective for small purchases. This is the pri- 
marily rationale for indirect systems, where the 
new accounting entity (network payment opera- 
tor) is able to aggregate small transactions and 
carry out transfers with the user's main account on 
a periodic basis. 


4. Security mechanism — this could be intrinsic, rely- 


ing solely on cryptographic techniques in software, 
or extrinsic, using a tamper-resistant device such 
as a smart card or PCMCIA device. 

Intrinsic systems are generally secure as far as 
network transactions are concerned; they are open 
to attack at the level of the workstation, or from a 
fraudulent user. In general, the use of extrinsic 
security mechanisms is associated with token- 
based systems. Jt probably makes it easier to protect 
against certain types of fraud, such as double- 
spending of tokens, or aitacks on the integrity of 
the user workstation. Furthermore it may have 
benefits in terms of user perception, in providing a 
tangible object representing monetary value. There 
is obviously a cost implication, both in terms of 
the device itself, and an appropriate interface on 
the workstation. 


. Degree of privacy — this could at several levels: 


Level 0: seller knows who is buying, others can 

see who is buying what 

Level 1: seller knows who is buying but outside 

parties cannot see anything 

Level 2: seller does not xnow who is buying, but 

buyer can be traced if required 

Level 3: no way for anyone to trace the buyer. 
The following table categorizes a number of pay- 
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require TTPs to provide such devices, and to up- 
date their contents. Any practical networked 
payment system will need to provide access to a 
TTP server at some stage in the proceedings. 

In general, online systems offer better protection 
against fraud, but face potential bottlenecks at 
high levels of transaction traffic. 

2. Nature of value representation — this may be a 
payment order, which is a request to a third party 
to transfer a specified quantity of funds to a speci- 
fied recipient, or a token, indistinguishable from 
cash. 

Order-based systems are the electronic equivalent 
of cheques, credit cards, debit cards and so on. 
Funds leave the buyer's account and go into the 
seller's account as a result of a payment order (or 
financial instrument, such as a cheque or a credit 
card transaction record). There may well be a 
clearing delay, but the funds involved never leave 
the banking system. Token-based systems on the 
other hand involve debiting the buyer's account 
when the token is issued; the seller's account is 
credited only after a received token is submitted to 
the seller’s bank. This is clearly analogous to cash 
transactions, although clearly a digital token has 
potentially more complex behaviour than cash 
(e.g. the ability to impose restrictions on what it 
can be spent on). 

. Relationship to existing financial systems — this 
may be direct, where existing accounts (e.g. witha 
credit card company) are used for each transac- 
tion, or indirect, where a new independent 
accounting entity is created to manage transac- 
tions. 

Clearly, using existing credit card accounts is 


perhaps the most straightforward way of making 
payments, as long as basic requirements of au- 


Value representation 
Token 


First Data 


OpenMarket 
Cybercash 
NetBill 


ment systems in terms of these five attributes. The 
specific systems are summarised below. 


Offline | Direct 


ecash 
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tion must be the same as the one used by the sender; 
common standards are known as MD2, MD4 and 
MD5: these vary in the ratio between processing 
speed and cryptographic strength’. 

A public domain digital signature implementa- 
tion called PGP (Pretty Good Privacy) is becoming 
popular on the Internet. A number of payment sys- 
tems will support its use. 


Authenticity 

While the digital signature can assure that the mes- 
sage received is what was sent, it cannot on its own 
assure that the sender is not an impostor. In other 
words, the recipient needs to be able to trust that the 
public key that is obtained for decrypting the signa- 
ture actually does belong to the purported sender. 
This trust may be achieved in various ways, but in 
general it requires the use of a third party to certify 
that the key does indeed belong to the sender: the 
certifying authority (CA) uses its private key to sign 
the certificate digitally which basically links the 
sender’s public key with its name. Of course, if the 
trustworthiness of the CA might be in question, its 
public key could be certified by a higher level CA 
(and this could be repeated up to a level where trust 
is implicit). 

The CCITT X.509 standard* defines a widely 
accepted format for public key certificates, which is 
endorsed by the Internet Privacy-Enhanced Mail 
standard (PEM) and compatible with the PKCS set 
of standards‘. 


Non-repudiation 
The sender of a message that is cryptographically 
verified (through the use of the sender’s public key) 
as authentic and unaltered is not in a position to 
repudiate its transmission, as the message must have 
been signed using the sender’s private key. 

Non-repudiability of receipt can be assured by 
the recipient sending a receipt consisting of the re- 
ceived digital signature encrypted using the 
recipient’s private key; the original sender uses the 
recipient’s public key to decrypt this receipt, and 
hence check that the original signature is what was 
originally sent. 

Other security requirements for electronic com- 
merce include: 


Time-stamping 
Commercial transactions are often time-sensitive. A 
time-stamp can easily be incorporated into a mes- 
Sage; again a trusted third party is usually required to 
provide a verifiable time-stamp; an example is the 
BellCore DTS’. 


Electronic cash tokens 

Digital signatures provide a basic means of repre- 
senting cash: in Chaum’s scheme’, the payer obtains 
cash from a digital bank by asking the bank digitally 
to sign a note number issued randomly by the payer; 
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Cryptographic techniques and standards 
Cryptographic techniques underpin most of the sys- 
tems covered in this paper; they are orthogonal to the 
various layers of protocols on which the systems are 
based. The following are the basic requirements for 
supporting secure transactions, whether the mes- 
sages represent credit card details or cash value 
tokens; additional requirements are imposed by the 
need to deal with problems such as double spending 
and anonymity. Token-based electronic cash sys- 
tems differ from payment order-based systems in 
that it is the issuing bank that has to provide the 
recipient with an easy way of verifying that the token 
is genuinely worth the amount it purports to repre- 
sent, just as a bank note has to have difficult-to-forge 
features such as watermarks and metallic inserts. 
The basic security requirements are: 
e authenticity: how to be sure that the other party in 
a transaction is who they are supposed to be 
٭‎ integrity: how to be sure that the received message 
is exactly what was sent 
e confidentiality: how to be sure that no unauthor- 
ised third party can read the message 
e non-repudiability: how to ensure that the sender 
cannot deny sending the message, and that the 
receiver cannot deny receiving it. 


Confidentiality 

The basic technique involved is encryption. This 
may be carried out using secret key or public key 
methods. Secret key methods such as DES’, RC2 and 
RC4 involve using the same key to encrypt and 
decrypt the information; this requires the key to be 
kept secret although it has to be shared between the 
parties concerned, requiring secure key management 
procedures. Public key methods such as RSA? or 
PGP involve using different keys for encryption and 
decryption: generally one key is kept secret while the 
other can be disclosed publicly. Sending a confiden- 
tial message securely to someone involves using the 
recipient's public key to encrypt the message, in the 
knowledge that the recipient will be the only one able 
to decrypt it. 

Although public key methods are easier to man- 
age, they are generally slower than secret key 
methods; a common compromise is to use a 'digital 
envelope' in which the message text is encrypted 
with a secret key, which itself is then encrypted with 
the recipient's public key. 


Integrity 

The sender of a message can make its integrity 
verifiable by using a hash function to make a digital 
fingerprint of the message (known as a message 
digest), which is then encrypted with the sender’s 
private key and transmitted along with the message 
as a digital signature. By using the sender’s public 
key to decrypt the signature, the recipient is able to 
verify that the hash function produces the same 
fingerprint from the message. Clearly, the hash func- 
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CERN is in the process of defining an alternative 
approach called Shen; details are still unclear. 


SSL (Secure Sockets Layer) 

This actually operates at the socket interface be- 
tween TCP/IP and the applications. When a TCP 
socket connection 1s made between two hosts, SSL 
first allows the client to authenticate the server and to 
negotiate specific security levels with it, resulting in 
agreement on a specific pair of symmetric keys for 
the session. The session key is used to secure all 
further messages, including a client authentication 
exchange if required. For Web applications, SSL sits 
between TCP/IP and HTTP; all HTTP traffic is thus 
transported over a secure connection, once it is es- 
tablished. This means that all application data, 
including URLs, credit-card numbers, other data 
entered via HTML forms, are securely encrypted. 


SHTTP (Secure HyperText Transfer Protocol) 
The SHTTP proposal adds security mechanisms into 
the HTTP protocol platform in order to enable WWW 
clients and servers to authenticate each other sym- 
metrically, and exchange confidential, non-repudiable 
messages. The protocol allows the two parties to 
negotiate, for each transaction, which specific key 
management mechanisms, security policies and 
cryptographic algorithms are to be used. Àny mes- 
sage (in either direction) may be signed, authenticated, 
encrypted, or any combination ofthese. To cope with 
sending secured messages to someone who has no 
key pair, it provides means of pre-arranging symmet- 
ric session keys, in addition to the usual key 
management schemes. 

SHTTP provides for fast verification of message 
authenticity and integrity (without non-repudiability) 
using a shared secret key (1.e. a password). 


Building commercial applications on the Web 

À number of suppliers have introduced or announced 
Web servers and clients which have security capa- 
bility, and hence are able to support electronic 
commerce in some form. The majority of these are 
based on the use of SHTTP which has been made 
available through a toolkit called SecureWeb from 
Terisa Inc. The companies include Spry, Spyglass, 
Open Market, CyberCash, Verity, O'Reilly, Tan- 
dem. The use of SSL is so far limited to its originator, 
Netscape, although a few others such as Open Mar- 
ket and Tandem have announced their intention to 
support it. 


Examples of payment order systems 

Most of the systems of this type require both parties 
to open accounts with the payment service provider, 
with the exception of First Data. First Data is an 
established credit card handling agency and 1s there- 
fore able in effect to allow users of the Netscape 
client/server combination (secured with SSL) to in- 
teract directly with the common card providers. 
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the bank uses a specific key pair which is publicly 
associated with the value requested. The bank debits 
the user's account (or accepts hard cash) at the time 
of issue. When payment is made, the payee is able to 
verify (with the appropriate public key from the 
bank) that the token is genuine; on receiving it from 
the payee the bank is also able to verify that they did 
issue it, and hence enable them to credit the payee's 
account. 


Double spending 

The capability for sending non-repudiably authentic 
messages enables the issue of digital cash tokens; 
however, several problems arise with this. Firstly, it 
must not be possible to spend a digital token more 
than once. The use of extrinsic techniques such as a 
smart card, or transaction-time access to a third party 
server, make it possible to check for, and prevent, 
attempts at double spending. Chaum and his 
colleagues? devised offline software-only methods 
to detect double spending after the event. 


Anonymity 

The requirement for anonymity in payment is almost 
the complete opposite to authentication: here the 
payee needs to be assured that the payment received 
(whether a payment order or a cash token) is indeed 
authentic without knowing the identity of the payer. 
A group at AT&T” have devised systems for anony- 
mous payment orders which rely on a trusted third 
party to mediate in a transaction, maintaining a bar- 
rier which preserves anonymity. Token-based systems 
achieve privacy through ‘blind signatures’ (see 
Chaum?). The key to this is that when the issuing 
bank signs a note on behalf of a payer, the note 
number it sees is untraceably different from the note 
number that is used when payment is made. This uses 
a patented technique which allows the user to alter 
the note number before signing in such a way that it 
can be restored after signing. 


Secure Internet protocols 

Most of the techniques outlined above are of general 
relevance to secure communications, and are best 
incorporated into the communications infrastructure 
to be used, rather than having to be built into particu- 
lar applications. In the context of the Internet, and of 
the World Wide Web in particular, there are a number 
of approaches to the development of a secure envi- 
ronment. At one level, it is possible to incorporate 
security mechanisms into the basic underlying TCP/ 
IP transport mechanism — the SSL protocol proposed 
and implemented by Netscape Communications" is 
an approach towards this. With this approach it should 
then be possible to build secure networked applica- 
tions using standard higher-level protocols such as 
those used for the WWW, newsgroups, or email. 
Alternatively, the mechanisms could be built into a 
specific higher-level protocol such as HTTP for the 
Web!, This is the approach used in EIT's 1 1 ۰ 
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Stefan Brands? has proposed a version of elec- 
tronic cash which relies on tamper-proof hardware to 
prevent double spending. 

The major banks are now starting digital cash 
initiatives (initially independently of the Internet, 
but with the intention of converging) such as 
Mondex?‘ from National Westminster, who are start- 
ing a major trial in Swindon. Visa have announced 
their digital purse project. These are basically smart 
card schemes providing secure management of elec- 
tronic cash as an extension of the user's normal 
account. 


Legal and political issues 

All monetary systems are subject to various laws and 
regulations providing controls and safeguards for the 
use of money. It is still not clear how the existing 
framework will apply to electronic money, particu- 
larly o7 the token variety. Electronic payment systems 
based on payment orders should not pose any prob- 
lems in this respect — the only issue might be the 
ultimate acceptability of digital signatures. 

The reliance on cryptographic techniques raises 
two separate issues. One is the reluctance of some 
governments to allow the use of strong encryption 
over public networks unless they have the ability to 
decrypt any message. Key escrow is the normal 
approach to this. The other is the reluctance of some 
governments (the US in particular) to allow the 
export of advanced cryptographic technology on the 
basis that it might be used against them. For exam- 
ple, the RC4 bulk encryption technique may be 
exported from the US as long as it is limited to 40-bit 
keys. The implication of this is that a government is 
more likely to be able to afford the computing power 
needed to break such a code than a random criminal, 
whereas if the full 128-bit key is used, the power 
needed is in practical terms infinite. There are fewer 
restrictions on technologies for authentication as 
opposed to encryption. 

These restrictions give rise to the requirement for 
the various secure network protocols used in elec- 
tronic payment to be flexible in their choice of 
encryption methods. 


Conclusion 
The area of open networked payment systems is 
moving very rapidly, with a number of trials al- 
ready under way in various parts of the world. The 
banking community is now beginning to show an 
interest, and in some cases, even taking a lead. 
Suppliers of Web software are incorporating secu- 
rity enhancements into their products, with every 
indication that convergence between different pro- 
tocol approaches is likely. The indications are that 
paying for goods over the Internet will become 
commonplace and widely accepted within the next 
few years. 

For the bibliographic information community, 
the priority should be to trial and validate a number 
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Most of them also are online svstems with the 
service provider being in the transaction loop. The 
exceptions here are NetCheque and NetChex, which 
essentially provide electronic analogues of conven- 
tional paper cheques, relying on digital signature 
techniques. 

In general, payment order systems do not provide 
any privacy to purchasers; however, a group at AT&T 
have proposed a scheme called Anonymous Internet 
Mercantile Protocol which does support anonymity’. 
This relies on separating out the various components 
of a transaction and using cryptographic techniques 
to hide sensitive information from either party. 

First Virtual and First Bank of Internet are inter- 
esting variants which might be termed ‘factoring’ 
schemes. FV is based on credit cards, but avoids 
sending card details over the network. The FV server 
keeps the buyer’s card detail on its server; the buyer 
sends an FV-1ssued ID number to the merchant who 
sends it on the FV together with payment details, and 
sends the goods to the buyer. FV will deduct the 
payment from the buyer’s account only after receiv- 
ing confirmation from the buyer that the goods are 
satisfactory, and that the invoice is correct. They 
claim that security techniques are not needed for the 
message transfers. 

FBOI is based on ATM cards procured by FBO] 
from Visa. On prepayment, each user (buyer and 
merchant) is issued a card with FBOI keeping a 
duplicate. Each transaction involves FBOI reconcil- 
ing electronic cheques and invoices, and transferring 
funds between ATM accounts appropriately. The 
use of the ATM network makes it easy for users to 
withdraw real cash at any time. 

NetBill’’, Open Market! and CyberCash are fairly 
similar in their approach, relying on an online server 
handling user accounts. They take advantage of the 
indirect accounting approach to aggregate small pay- 
ments and hence reduce the transaction costs 
substantially (in the order of 1 cent). Open Market 
and CyberCash do offer the ability to interact di- 
rectly with existing bank or credit card accounts, for 
larger transactions. In fact, CyberCash claim that 
they intend to support a variety of payment mecha- 
nisms, including a token-based scheme. NetBill places 
particular emphasis on ‘certified delivery’ which 
ensures that goods are delivered if and only if they 
have been paid for. 


Examples of token-based schemes 

NetCash" relies on having a close link to existing 
financial infrastructures to facilitate translation 
between anonymous electronic currency and non- 
anonymous electronic cheques. 

DigiCash's ecash uses blind signature technol- 
ogy to provide anonymous cash tokens, with a specific 
technique which will detect double spending after 
the event, and at the same time identify the perpetra- 
tor?*, This system is being trialled on the Internet at 
present. 
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of the emerging technologies, particularly those that 
offer a high degree of flexibility in charging and 
payment, as well as having very low transaction 
costs. The need to cope with varied and variable 
charging policies is probably the most important re- 
quirement; the ease with which particular payment 
systems can be configured, adapted and integrated 
with future networked bibliographic information and 
document delivery services is also a relevant issue. 
These aspects can only be evaluated through carrying 
out experimental implementations of such systems. 
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Abstract 

Artificial intelligence, including expert systems, fuzzy logic, neural networks and genetic algorithms, is increas- 
ingly being applied to the solution of a wide range of problems in the monitoring and operation of electricity 
supply systems. Following the privatization of the Electricity Supply Industry in England and Wales in 1990, there 
is an overriding commercial incentive for the privatized electricity companies to operate the high voltage 
transmission networks as economically as possible without compromizing their reliability in a climate of 
substantial uncertainty as to the generator prices and availabilities that are bid into the pool from day to day and 
the energy trading contracts that have to be implemented. These circumstances often mean that the transmission 
and distribution networks must operate close to their defined security limits and still be capable of surviving severe 
disturbances. Hence artificial intelligence is being applied to the development of online real-time monitoring 
systems to assist the electricity supply companies' control room engineers. This paper reviews this field and 


parameters are. From the practical point of view, all 
these parameters need to be combined to give an 
overall rating of the system state which relates to the 
need for action by the control engineer. Insights into 
the causes of insecurity or of uneconomic operation 
must also be provided to aid in the selection of 
appropriate actions. 

Many early attempts to provide real-time expert 
system solutions to power system control problems 
have failed due to limitations of software and hard- 
ware technology —e.g. slow response. In the changing 
environment today it is important to find solutions to 
the new operating problems that are flexible in the 
face of tremendous uncertainty and where the neces- 
sary high quality of the solution includes adequate 
accuracy and robustness. Àn adaptable, or 'intelli- 
gent' methodology is required. 

Fortunately information technology is growing 
exponentially in terms of computing and communi- 
cation capabilities. Thus microprocessor-based 
machines have been improving in performance at a 
rate of between 1.5 and 2 times per year during the 
last six to seven years. Improvement rates for 
mainframes or minicomputers are about 25 percent 
per year. The performance of fibre-optic links has 
had an order of magnitude improvement every two 
years in the last decade. At the same time prices are 
halved every year. 


Artificial intelligence 

Knowledge processing technology, or 'artificial in- 
telligence', includes expert systems, fuzzy logic, 
neural networks and genetic algorithms. Traditional 
analytical methods start by constructing a precise 
mathematical model of the physical system. For 
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presents two case studies. 


Introduction 

Throughout the world the electricity supply industry 
is facing unprecedented transformation caused by 
the pressures of privatization. Competition has been 
systematically introduced in the generation sector 
and new independent power producers are seeking 
access to transmission networks. Electricity demand 
continues to grow in many countries without a corre- 
sponding increase in transmission and generation 
capacity for environmental reasons. Inter-utility en- 
ergy transfer to take advantage of market economies 
is gaining popularity wordwide. Therising economic 
consciousness of electricity consumers in seeking 
cheaper power supplies introduces new system oper- 
ating concepts such as ‘power wheeling’ (or energy 
trading). All these factors cause the power flow 
patterns of many power systems to deviate from the 
patterns for which their transmission networks were 
originally designed. The variability and uncertainty 
of the new power flow patterns, compounded by 
sharp demand increases, reduce transmission operat- 
ing margins and push power networks closer to their 
power transfer and voltage limits, resulting in 
brownouts and blackouts. As a result a series of 
system breakdowns and voltage collapse incidents 
have been reported worldwide during the last two 
decades with increasing frequency. 

An important task in power system operation is to 
decide whether, following an unforeseen disturbance 
(e.g. outage of lines, transformers or generators), the 
system will still remain in a safe operating condition. 
At the same time, it is required to operate the system 
in the most economic manner possible. In order to 
monitor the security and economy of the system, the 
question arises as to what the important performance 
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Synapses pass signals from one neuron to another. If 
the signal entering the nucleus exceeds a threshold, 

chemical and electrical changes occur and the neu- 

ron is said to fire. An artificial neural network has 

processing elements or ncdes, resembling neurons, 

and connections or links, resembling synapses. Each 

connection has its weight and the weighted sum 

enters into the node. The transfer function of the 

node is a nonlinear threshold function. The network ۰ 
may be connected in many different ways. The learn- 

ing period of the neural network consists of using 

known patterns to adjust the weights. The design of 
theneural network includes such tasks as input selec- 

tion, network architecture and training. Many 

networks have been proposed and both supervised 

and unsupervised training have been developed. 

Neural networks are being applied to many power 

system problems. A CIGRE Task Force recently 

analysed 150 publications and found that load fore- 

casting and dynamic security assessment are the 

most popular application areas: 


APPLICATION AREA ۵ 


Load forecasting 20.5 
Security assessment 19 
Control 15 
Fault diagnosis 14 
System identification 12 
Economic dispatch 7 
Alarm processing 2.5 
Protection 2 
Other MM 
100 


Case study 1: an expert system for security 
monitoring 

Control room engineers require decision support 
software to assist in their online monitoring and 
control of the power system. It is increasingly con- 
sidered that artificial intelligence techniques are best 
placed to address this requirement. ۸ prototype real- 
time expert system has been developed by Knight 
et al! to assist the control engineers in monitoring the 
most important overall aspects and events on a trans- 
mission network: in particular those involving 
security and economic aspects. The required deci- 
sion support system should be able to carry out 
online analyses of the power system state, identify 
the most important problems, draw these to the atten- 
tion of the control engineers and, where necessary, 
suggest remedial actions. ۸ summary ofthe develop- 
ment work is presented here. More detailed accounts 
can be found in Knight ef ai! and in the paper by 
Ekwue et al?. 

The security monitor is designed to detect power 
flow, group transfers, voltages or frequency out of 
limits either in the system state or in a contingency 
state. ۸0 associated security rectification module is 
proposed to suggest actions to remove the problem. 
۸ modular design has been conceived for the expert 
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example, in state estimation, the mathematical rela- 
tion between the measurements and the state variables 
is obtained based on physical laws such as Ohm's 
and Kirchhoff's laws. The problem is then formu- 
lated in terms of a standard structure, e.g. weighted 
least square estimation. Solution algorithms for the 
problem are derived and one that is suitable for the 
problem at hand is selected. Computer programming 
is carried out based on the algorithm. The approach is 
highly.structured. However, a vast number of prob- 
lems do not have exact mathematical models, or the 
models do not fit into any structured solutions, or no 
implementable solution algorithms exist. Now hu- 
man knowledge about problems and their solutions 
is often much broader and expert systems have been 
devised to process procedural knowledge in combi- 
nation with data gathered from the physical system. 
Fuzzy logic can be used to process imprecise knowl- 
edge. Finally, neural networks are being developed 
to process knowledge stored as patterns. 

An expert system has a knowledge base and an 
inference engine as two of its basic components. The 
knowledge base stores domain-specific expert's 
knowledge about the problem, which may be rule- 
based or model-based. The inference engine contains 
the mechanism for the application of the knowledge 
base to solve the problem. For example, rules in the 
knowledge base may be chained starting from the 
input data until a solution is found (forward chain- 
ing). When more than one rule satisfies the condition 
to be chained next, a conflict resolution scheme must 
be employed. The success of an expert system de- 
pends critically on the knowledge base. Several areas 
in power system operation and control have ben- 
efited from expert system solutions, including fault 
diagnosis, customer restoration, intelligent alarm 


processing, substation switching and others. Many 


working systems exist today in power system opera- 
tion. 

Human knowledge is seldom precise. Fuzzy logic 
is a way to deal with our imprecise knowledge. One 
of the rules may be stated as follows: if the tempera- 
ture is just right, you should turn the air speed of the 
air-conditioning system to medium. ‘Just right’ and 
‘medium’ are both fuzzy concepts. For example, if 
the ambient temperature is between 60 and 70, we 
consider it just right. But we can argue that 65 
degrees is perhaps ‘more’ just right than 61 degrees. 
‘Membership functions’ are used to describe these 
fuzzy concepts. Fuzzy logic has recently been pro- 
posed to tackle power system generation scheduling 
problems and transformer fault diagnosis problems. 

One of the criticisms of the expert systems ap- 
proach is that the rules are derived from ‘experts’, 
but human experts learn from experience. A neural 
network is an attempt to mimic the way the human 
brain functions, including learning. A human brain 
contains a large number (10 billion) of nerve cells 
called neurons. A neuron has dendrites to carry sig- 
nals (impulses) in and axons to carry signals away. 
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Figure 1. Modular expert system for security monitoring 


and economics. This statement will be brief, re- 
flecting just the most important power system 
states and events. 

4. An X-windows based human-computer interface. 
The most important function will be to display 
clearly and reliably the current statement from the 
third level module. Its secondary function will be 
to allow a convenient display of any required data, 
for example via structured or intelligent means. 
This could include access to raw data from the 
EMS or to the more detailed outputs from the other 
modules. 

The generation and trensmission system of the 
South West and South Coast of England, including 
such demand and generation data as necessary, as 
shown in Reference 3 for the mid-1990s was used to 
demonstrate a fairly basic assessment of security 
using a commercially available real-time expert sys- 
tem software package. A 20-bus preliminary test 
system based on? 1s shown in figure 2. Some bounda- 
ries have been added (dotted lines) across which 
power transfers can be compared with capabilities. 
The circles in figure 2 are used to indicate the power 
status of the particular border concerned. Thus the 
colours of these circles change according to the 
power transfer across these borders in relation to the 
factor limiting the transmission capability (e.g. ther- 
mal capacity, voltage conditions etc.): green for «7096 
of limit, yellow for 70-85%, orange for 85-100%, red 
for 100-110% and red/blue blink + audio bleep for 
>110%. Together with the continuous digital display 
of total demand figures as shown in the top left hand 
corner of figure 2, it is also arranged that when any 
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system and its outline structure is shown in figure 1. 
The design includes a series of modules plus input 
and output as follows: 

1. A coordinating or first level module to control the 
flow of raw and processed data into, within and 
out of the system. This may well be based on a 
‘system database’ or ‘blackboard’ taking initial 
data from the Energy Management System (EMS) 
and elsewhere and becoming enriched by the other 
processes within the expert system. An ‘object 
oriented' database is being investigated. 

2. À series of second level modules, to perform 
functions such as the following. There is no im- 
plied constraint that each function must be 
implemented by a separate expert system module 
if an alternative is found more appropriate: 

e alarm analysis 

e transmission and generation security monitor (un- 

der development) 

economic operation monitor (under development) 

‘deviation from expected conditions’ monitor 

emergency situation status 

security rectification 

‘control facilities status’ overview 

It is possible that further functions may be added, 

as the user requirements evolve over time. These 

could for example include advice on voltage con- 
trol and restoration. 

3. A third level module, the overall system state 
monitor, which will coordinate the output from the 
second level expert systems and other data sources 
into a composite statement of the overall state of 
the system, including in particular both security 
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Figure 2. Test system: South West and South Coast of England 
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e how. is the performance affected by choice of 
model? 

e what is the best architecture for a certain problem? 

e how can the neural network be trained efficiently? 

The neural network model used in this study is 
the multi-layered feedforward model trained by the 
well-known backward error propagation algorithm. 
À commercially available, user-friendly software 
package was used. The approximation of the func- 
tional relationship between the power system 
operating conditions and the voltage collapse index 
is achieved by the training process in which the 
weights and thresholds of the neural network are 
adjusted until the desired associations have been 
made. ; 

The methodology was applied to the NGC 48-bus 
equivalent system described in Reference 5. To match 
the given loading profile a typical generation dis- 
patch schedule was constructed. A loading test was 
conducted on the test system to determine the prox- 
imity to voltage collapse. Full details ofthe technical 
assumptions, theoretical analysis and training meth- 
ods are given in*. It was found that the neural network 
training process is very computationally demanding. 
However, the method takes negligible time to evalu- 
ate voltage stability once the neural network has 
been trained. A high performance efficiency sug- 
gests that this approach could be employed 
successfully online. 


Comparison of neural networks and expert 

systems 

The problems encountered with fieüral networks, in 

the case study described ir the previous section, for 

example, are widely discussed in the literature and 

include: 

e dimensionality: many sets of operating points are 

necessary to approximate non-linear functions in- 

volving numerous pararaeters 

approximation logic 

optimum configuration selection 

choice of training methodology 

convergence difficulties 

opaqueness of neural network operation 

"black box' representation of the power system 

and consequent loss of physical insight 

e design inflexibility; the neural network must be 
re-trained whenever the physical system changes 

e fuzziness: a result will always be generated even 
with unreasonable input data. This fuzziness can 
make the neural network approach psychologi- 
cally unacceptable to pragmatic engineers. 

By combining a neural network and an expert 
system the individual strengths of each computational 
paradigm (the neuron-like connectionistic processing 
paradigm of the neural network and the symbolic 
processing paradigm of the expert system) can be 
exploited and utilized. In the hybrid scheme, the 
expert system carries out high-level monitoring, 
diagnosis or planning, whereas the neural network is 
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circle is clicked upon using the mouse, a sub-screen 
comes .up showing details of the particular power 
transfer across that border. 

Development continues on other prototype mod- 
ules, including economic operation, for example. 
A major on-going research task is the validation 
and verification of knowledge based systems such 
as this. 


Case study 2: applications of neural networks to 
voltage collapse monitoring ۱ 
State-of-the-art artificial neural networks are being 
applied to online monitoring tasks such as monitor- 
ing the risk of voltage collapse in electricity supply 
systems so as to relegate the complicated calcula- 
tions required in online voltage security assessment 
to offline repetitive power system simulations and 
recursive iteration processes in neural network train- 
ing. A feasibility study has been described by Short 
et al ^. A summary of the work is presented here. 
Mathematical and technical details can be found in*. 
Causes of voltage collapse in modern electricity 
supply systems are: 
e growth of consumer demand 
e insufficient generation and transmission capacity 
e high inter-utility energy transfer 
e new operation such as power wheeling 
e access of independent new power producers and 
cogenerators. | 
For voltage collapse analysis we have to look for 
methods of calculating the distance between the 
system operating point and the critical point. The 
distance is called the stability margin of the power 
system. Numerous voltage collapse indices have 
been proposed and examined by the many workers in 
this field. The mathematical modelling and analysis 
is difficult and computationally demanding. 
Researchers claim, and endeavour to demonstrate, 
that the advantages of neural network computing 
methodologies over conventional approaches include: 
e faster computation 
` e learning ability 
e adaptive features 
e data handling in large volumes in a parallel manner 
e robustness: i.e. having greater fault tolerance 
e noise rejection and fuzziness 
e generalization (interpolation and extrapolation) 
Although research on artificial neural networks 
has been in progress for many years, applications of 
neural networks to the solution of a wide range of 
electricity supply system problems have a relatively 


Short history and it was not until the early 1990s that - 


significant publications began to appear. 

In the case study discussed here“ the design ofthe 
neural network includes consideration of: 
e input selection: how many input variables? 
e network architecture: Perceptron, Hopfield, 

Kohonen, ....? 

e training method: supervised, unsupervised, .....? 
Questions about neural networks include: 
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e ifanexpertsystem cannot solve an alarm problem, 
you get no result 

e a neural network might output something 

e an expert system 1s able to explain results 

e a neural network cannot explain, except by build- 
ing an ‘inverse net’ in which outputs and inputs 
are inverted to find out what caused an event. 


Conclusion 
In this paper the author has attempted to review the 
wide and very active field of applications ofartificial 
intelligence to the electricity supply industry. The 
coverage has necessarily been selective but it is 
hoped that some flavour of the current state-of-the- 
art and its problems has been conveyed — arising 
from the author's involvement in two industry/uni- 
versity collaborative teams which conducted the case 
studies discussed above. Research, development and 
prototyping continue in a step-by-step manner. 
Undoubtedly the future holds much promise: thus 
another example of application programming takes 
lessons from biology. A ‘genetic algorithm’ is an 
approach to solving problems based on Darwin's 
evolution theory of natural selection and the survival 
of the fittest. Those who survive will reproduce and 
the crossover makes the new born generation differ- 
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Figure 3. Hybrid expert system/neural network monitor 


used to evaluate local problems having great analyti- 
cal complexity. A hierarchical structure to monitor 
voltage collapse is shown in figure 3 in which the 
neural network focuses on a sub-set of the problem 
space and so has a required training data set that is 
reduced to manageable size. In this approach to NGC 
voltage collapse monitoring the multi dimensionality 
problem of covering a wide range of power system 
operating conditions ranging from light overnight 
summer loads to maximum peak load in winter, to- 
gether with typical combinations of line and generator 
outages, can be solved by using the expert system in 
the first stage to identify the sub-space in which the 
NGC system is currently operating. An appropriately 
trained neural network can then be selected to deliver 
the online voltage collapse evaluation. Further work 
in this field 18 currently in progress. 

A comparison of neural networks and expert 
systems appears to lead to the following conclu- 
sions: 

e when you build an expert system you have to 
know what 1s going on 

e with a neural network you do not need to know, 
e.g. you can build a simulator, or dynamic model 
of a plant, without knowing any of its internal 
mechanisms, equations etc. 
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ent. Occasionally mutation happens. Genetic pro- 
gramming is an automatic procedure to generate 
programs to solve problems and can be expected in 
the future to solve some electricity supply system 
problems. While we must appreciate that each form 
of artificial intelligence has inherent disadvantages 
as well as advantages, interesting and powerful forms 
of artificial intelligence can sometimes be derived by 
skilful use of two (or more) methodologies inte- 
grated within a hybrid structure (eg expert system 
plus neural network array). Further development and 
evaluation of intelligent monitoring systems will 
continue until the difficulties have been overcome to 
a sufficient extent that validated and verified intelli- 
gent systems can be commissioned online with a 
degree of confidence that is acceptable to the end- 
user: the human supervisor or control engineer. 
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The future of electronic publishing for book publishers 


because they are cheaper and offer greater coverage 
and currency. Individual consumers are similarly 
swayed. People need relevant information now, and 
cannot afford to wait until a book is published. 


How can electronic publishing help book 
publishers? 

Electronic publishing offers a way of transforming 
the present vicious circle into a more benign, even 
beneficial one. Information can be handled much 
more efficiently in electronic form than in print. 
Through the use of Hyperlinking and powerful 
Boolean search and retrieval engines, consumers 
now can feel that they are in control of large databanks 
of information. Swamped by the huge mass of printed 
matter available, disaffected information-seekers 
should easily be won over by electronic publishing. 

Electronic publishing can help book publishers 
diversify. Until recently the author supplied a type- 
script that would be typeset, proofed, made into 
camera-ready-copy, printed and bound into book- 
form; from now on, the author will present the 
information in electronic form (text, pictures, soünd 
and video) and this collection will be proofed, ge- 
nerically tagged, assembled into a database and made 
available to the consumer through a variety of media 
— books, CD-ROMs and online. This ‘single-source, 
multiple-delivery’ concept provides the publisher 
with Valuable dual or even tertiary income streams. 
‘Single-source multiple-delivery' products should 
complement (rather than compete with) each other 
with each medium attracting different market niches 
— indeed, the creation of new markets 1s seen as the 
greatest benefit of electronic publishing. 

Electronic publishing can also provide long-term 
cost benefits. While short-run book publishing is 
inhibited by high paper and print costs, CDs cost at 
the most £1 each to press and information products 
sent online do not need individual packaging. Further- 
more, while paper and print costs are likely to 
continue to rise as environmental concerns begin to 
bite, the prices of electronic components are falling 
all the time. 


How can the consumer be persuaded to go 
‘electronic’? 

The printed book is still very popular. It 1s very easy 
to use, و1‎ totally intuitive and it can be read anywhere, 


within Britain 
Ricky Leaver 


Keswick Road, Putney, London 


Introduction 
The electronic publishing revolution has arrived. 
Some book publishers have embraced it willingly; 
some fearfully and reluctantly; some are ignoring it 
altogether but few, however, realize the huge impact it 
will have on the whole world of mass media. 

This article outlines the challenges and opportu- 


nities electronic publishing can offer book publishers.. 


Research for this article was carried out via a ques- 
tionnaire sent to 95 academic and professional book 
publishers (46 usable responses), follow-up inter- 
views with Addison-Wesley, Nelson, Kogan Page, 
Meckler, Berlitz and Electronic Publishing Services 
Ltd and by trawling through the current literature. 
This article is a précis of a dissertation submitted as 
part of a MSc course in Information Systems and 
Technology at City University. 


Is the economic basis of the book publishing 
industry fundamentally flawed? 

Making money from book publishing is becoming 
increasingly difficult. Why? Competition to capture 
most markets is becoming fiercer among rival pub- 
lishers and has created a splintering effect on the 
market, with publishers desperately trying to dis- 
cover and capitalize on new niches. These new niche 
areas have increasingly smaller markets and demand 
lower print runs which in turn necessitates setting 
book prices high to recover fixed editing, typesetting 
and printing costs. As each new title provides a rather 


small return on investment, the publisher is forced to | 


push many different titles into the marketplace to 
create one large collective return and, although con- 
sumers are still buying as many books as previously, 
this flood of books is outstripping demand, thereby 
creating a vicious circle — as the marketplace be- 
comes increasingly saturated, a greater percentage of 
books will remain unsold. Publishers will arrive at a 
situation when only a few of their titles are breaking 
even. 

Consumers’ needs are changing. The monograph 
used to be the bread and butter of the publishing 
industry. It could be expected to sell 1,500 copies and 
thus break even. Nowadays publishers would be lucky 
to sell more than 500. What has happened to the main 
consumers of these monographs, the libraries, for 
example? They now look towards different media: 
journals, newspapers and magazines are more popular 
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Software requirements 

Well-written software will be integral to the success 
of an electronic product as it will offer something 
unique to the consumer — full multimedia (text, 
pictures, animation, sound and video). However, the 
multimedia content must be applied meaningfully 
and match each market's aspirations. 

The software should be idiot-proof. It should be 
completely intuitive and as easy to use as reading.a 
book. Although the Windows graphical user inter- 
face is fast becoming the standard for electronic 
products, Microsoft may soon be replacing it with a 
new interface aimed at the computer-illiterate mar- 
ket. Code-named ‘Utopia’, it will use pictures of 
every-day objects to act as metaphors for certain 
tasks. | H 

Most publishers questioned were adamant that 
the software must also promote easy manipulation 
and interaction whereby the consumer controls the 
direction and pace of the product. Information re- 
trieval should be facilitated through the use of Boolean 
keyword search or statistical methods such as 
Probablistic Term Weighting. The navigational 
method of Hyperlinking can aid interaction, whereby 
special or interesting material within one body of 
information can be 'linked' directly to another body 
of associated material as well as improving the cus- 
tomer's ability to browse. Successful interaction 
should attract as much user involvement as possible. 
Nelson, for their Spanish language product En 
Marcha, provide a microphone with every pack so 
that the user can record and then listen to his or her 
own voice from the computer. 

Due to the huge storage capacity of digital media, 
the consumer can be given immediate access to a 
larger body of data. For instance, Oxford University 
Press has produced the whole of the multi-volume 
Concise Oxford Dictionary on a single CD-ROM. In 
the future, it is possible that publishers may make 
their whole list available online for consumers to 
'cherry-pick' desired information. 

The software should react promptly to the user's 
keyboard and mouse commands and screen feedback 
should be instantaneous. Quick response time can be 
achieved by writing tight software for the electronic 
product. Dorling Kindersley write special custom- 
ized code, so that yesterday's less powerful computers 
(386 25MHz PCs) can successfully harness their 
products. 

One ofthe joys of reading from print-based mate- 
rial is that the consumer can easily skim through the 
text and occasionally unearth a ‘pearl’ of informa- 
tion. Electronic products need to emulate this. So far 
rudimentary scroll bars and the limited amount of 
words that fit on a screen hinder successful brows- 
ing. Hyperlinking software will improve the 
consumer's ability to browse and higher screen reso- 
lutions will allow smaller and clearer type. 

The software should allow the consumer to cus- 
tomize the electronic product to his or her own needs. 
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anytime. Many people choose to display their book 
collections, and each title is seen as a tangible 


possession. Also, consumers are conservative by. 


nature and will be reluctant to abandon the book in 
favour of screen-based products. From the responses 
in the questionnaire, consumers' familiarity with 
print is seen as a major factor behind the market's 
reluctance to accept electronic publishing. What 
requirements, both hardware and software, are 
needed to effect this change? 


Hardware requirements 

The hardware that delivers the electronic product to 
the consumer needs to match the physical qualities of 
a book. The screen must offer easy readability to the 
consumer. CRT (Cathode Ray Tube) technology can 
now offer acceptable flicker-free, high resolution 
screens for desk-bound monitors. The LCD (Liquid 
Crystal Display) technology that is used for portable 
computers cannot yet match this screen resolution. 
Further development are needed: larger screen sizes 
for demonstrations; and screens that can adapt to 
sudden changes of background brightness and lumi- 
nescence. 

Ideally the hardware should be as portable as 
possible in order to increase accessibility. Portability 
is improving through the continuing development of 
the flat-back LCD screen technology while the fast 
increasing transistor circuit-density of semi-conduc- 
tors facilitates the miniaturization of hardware. All 
hardware manufacturers are developing laptop and 
even palmtop computers: Sony’s Data Discman can 
take small CD-ROMs; Knight-Ridder and Associ- 
ated Newspapers are developing electronic ‘tablets’ 
to support a portable electronic news service. 

The standard input tools to the hardware are the 
keyboard and mouse. While these promote interac- 
tion, they are not completely intuitive, whereas speech 
recognition would offer a more natural method. 
Present commercial systems can recognize up to 
120,000 words, but need to be improved in order to: 
recognize all words in a dictionary; cope with any 
accent or language; and understand continuous 
speech. Other input technologies worthy of consid- 
eration are touchscreen and hand-writing recognition. 

PC or TV? Publishers have a choice of hardware 
platform: they prefer the PC as it is more ‘intelli- 
gent’, promotes greater interactivity and is more 
text-based (from the questionnaire, not one publisher 
chose the TV as his or her preferred platform). The 
PC and TV, however, are moving closer together: the 
new Apple Macintosh LC630 comes complete witha 
television tuner that can access many different TV 
channels; and Philips and Panasonic are developing 
their own ‘Interactive’ TVs that can act as platforms 
for their respective CD-I and 3100 products. It is 
possible that the PC and TV will converge in a single 
all-purpose hybrid: Packard Bell are already pio- 
neering a system that is a PC, TV, radio, telephone 
and fax all in one. 
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can update original disks that they have already 
sold to consumers, ideal for annuals such.as 7he 
Good Pub Guide. . 

There is confusion over CD-ROM standardiza- 
tion. As soon as a standard has been 'rubber- 
stamped' by the ISO, CD technology gains another 
dimension, and a new standard is required. At the 
moment there is a battle between the 'Frankfurt 
Proposal’ and Photo CD Multisession to be the 
standard for multisession CD-recordable, while the 
desire for high-density CD-ROMs will create the 
need for yet another standard. Furthermore, there is 
the incompatibility between the PC-based CD-ROM 
and the TV-based CD-I. Universal standardization 
is badly needed to promote the mass production 
of CDs. 

CD-ROM publishing has two different pricing 
strategies. The individual consumer will see the CD 
as a possession, and thus prefer to pay a one-off fee 
for it. The group consumer, such as an organization, 
may prefer to borrow rather than buy the informa- 
tion. Group consumers would be charged on a 'pay 
per view" basis whereby any time the CD is being 
read, the usage is metered via a modem link. 

As both general consumers and booksellers may 
be unfamiliar with CD-ROM products, publishers 


need to mount a large-scale retailing and marketing 


strategy to galvanize the market. Sales forces need to 
demonstrate the electronic products to the booksell- 
ers, who in turn must coax the general consumers to 
*test-run' them within the shops. Publishers should 
look also to other retail outlets, such as electronic 
goods retailers, record shops, newsagents and super- 
markets to boost their high street presence. In the 
future, specialist electronic-only *bookshops' may 
emerge. These shops would not have to cope with the 
stock problems that ordinary bookshops face as only 
one master copy of each title would have to be kept; 
if a consumer wanted to buy a particular title, this 
could be copied immediately from the master copy 
on to a blank CD. 

It is possible that the CD-ROM technology will 
become obsolescent in the near future. Once consid- 
ered to be its greatest strength, its storage capacity 
now cannot deal with an adequate amount of inter- 
leaved full-screen video and sound. There are two 
new technologies, albeit some way off commercial 
usage, that could take over: the first is frequency 
domain optical storage, which can pack ten gigabytes 
into one square centimetre of recording material; the 
second is holography, where ‘photorefractive’ poly- 
mer films, the size of a two pence coin, can store 
several billion bits of information. 

PCMCIA cards are another possible alternative 
to CD-ROM. The size of credit cards, these can be 
plugged into laptops and palmtops, offering low 
price/high penetration information such as maps, 
thesauri and foreign phrase guides. 

From the results of the questionnaire and inter- 
views, book publishers have shown a strong 
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For instance, Nelson allow their consumers to canni- 
balize their favourite bits from any of their CDs so 
that they can create their own ‘scrapbook’ on to a 
floppy disk provided. Going far beyond this, the 
consumer could have access to a publisher’s com- 
plete list and cull all the information necessary to 
produce his or her own definitive text. 

Although development costs will be high, the 
electronic product must be priced attractively to gain 
a foothold in the mass-market. For middle-range 
products the retail price needs to be under £50 for 
average consumers to part readily with their money. 
If and when mass-consumerism does embrace elec- 
tronic publishing, it will be interesting to see the 
resulting price reductions and whether the price of 
the electronic product will fall below that of its 
traditional counterpart. | 


Delivery of electronic product: offline or online? 
Publishers have a choice of how to deliver their 
electronic products. Do they package them as CDs, 
and sell them offline? Or do they send them direct to 
the consumer online via Cable TV networks or the 
Internet? It will be seen that offline and online offer 
contrasting attractions and constraints for both pub- 
lishers and consumers. 


Offline 

CD-ROM has a massive digital storage capacity of 
up to 680 Mbytes. It can store 400 times the amount 
of information of an average 1.44 Mbyte floppy 
disk. CD-ROM storage capacity, however, still needs 
to be improved to provide the consumer with more 
video content. Video takes up a huge amount of 
memory with the result that most products currently 
on the market have a maximum of only 20 minutes 
of video, whose appearance on the screen is only 
beer-mat size and whose resolution is some way 
below TV standard. The situation is being improved 
in two ways: firstly, through the use of MPEG 
(Motion Pictures Expert Group) Video Data 
Compression Standard, 72 minutes of full-screen 
size video can now be squeezed on to one CD; 
secondly, and in the longer term, Philips are pio- 
neering a high-density CD-ROM which hold 3.3 
Gigabytes which is five times the current 680 Mbyte 
standard. 

CD-ROMs, unfortunately, can be very slow to 
deliver their information to the screen, and thus do 
not provide instantaneous feedback for the user. 
NEC, however, have produced a drive that can pro- 
vide a bandwidth of 600Kbytes per second, which is 
4 times faster than the first CD-ROM drives. 

Publishers can now press CDs in-house due to 
the new CD-recordable technology. For under 
£3,000 (and getting cheaper all the time) one can 
buy a CD-ROM recorder and economically pro- 
duce runs of 100 copies or less. CDs pressed in this 
manner can be written incrementally in many ses- 
sions (multisession). This means that publishers 
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(through optical fibre cabling) of the Cable TV me- 
dium. Companies such as Prodigy and America 
Online are looking to supply information services to 
supplement the main Cable TV channels. In time, 
however, the Cable TV network probably will be- 
come part of the Internet. 

Many book publishers are wary of selling infor- 
mation online on the Internet. They cannot predict 
how it will evolve or who will regulate it and they are 
worried about successful collection of revenue. They 
are also fearful that entering the online publishing 
market will put them in direct competition with 
natural online performers such as powerful TV and 
telecommunications companies. These fears are jus- 
tifiable, but online publishing will definitely happen 
and the opportunities are too good to ignore. 


What future technological improvements can 
help electronic publishing? 

The construction of information superhighways is 
being aided by the use of cptical fibre cabling replac- 
ing the traditional copper wire as the transmission 
media. Optical fibre transmission operates at a very 
high frequency and thus can provide a much larger 
throughput of data. Optical fibre was restricted to 
telephone trunk lines but row, due to the competition 
between different cable and telecommunications com- 
panies, it is beginning to extend to the ‘local loop’ 
(the connection between the local exchange and the 
home). The information superhighway is also being 
boosted by the introduction of ISDN, which is a 
digital upgrade of the old analogue telephone net- 
work, and Bill Gates's planned ‘teledesic’ network 
involving 840 low-orbital satellites covering 95 per 
cent of the earth’s surface. 

Data compression will help to squeeze more 
video on to the desktop. As already discussed, due 
to the MPEG Video Data Compression Standard, 
compression ratios of 180:1 have led to 72 minutes 
of video being available on one CD. Unbelievably, 
even larger compression ratios are being devel- 
oped; Frax have managed to compress 100 pages of 
near-photographic colour images on to one floppy 
disk. 

The improvements in flat-back screen technolo- 
gies and the miniaturization or hardware have also 
been discussed earlier. 

Microprocessor power has increased dramatically 
with the birth of a new generation of computers. The 
Pentium-based PC and the PowerPC both perform at 
about 90MHz, compared to the old generation of 386 
and 486 PCs (25MHz and 33MHz). These new PCs, 
which are competitively priced, can cope much more 
efficiently with mixed-media data. 


What new challenges will electronic publishing 
create for publishers? 

Through the advent of CD-recordable technology, 
anyone can pirate stolen information on to blank 
CDs, and re-sell the information very cheaply. Pub- 
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preference for offline over online delivery. Why? 
Firstly CD-ROMs are favoured because, like books, 
they yield ownership to the consumer. Secondly, 
CD-ROM publishing does not drastically alter the 
way traditional publishers run their businesses; gath- 
ering of content and manufacturing may be different, 
but pricing strategies, collection of revenue and chan- 
nels of distribution will remain the same as before. 


Online 

The Internet will be the most likely medium for 
online publishers. At the moment it is a sprawling 
free-form ‘network of networks’ that allows world- 
wide online communication and acts as a free 
information forum, predominantly for the academic 
world; it is now changing by firstly encouraging 
business and home users to hook on to it, and sec- 
ondly by allowing publishers to market information 
on it. 

The three main advantages of using the Internet 
as a publishing medium are its connectivity, capacity 
and currency: twenty million people (increasing by 
15 per cent a month) have access to the Internet; the 
volume of mixed-media information is not con- 
strained by the limited storage capacity of CD-ROM; 
and all information can be updated seamlessly. 

Sending the information product direct to the 
consumer has two advantages: (1) there is no need for 
the usual offline product distribution chain; and (ii) 
expensive warehousing space and transportation costs 
would be cut. 

Pricing strategies for online publishing will be 
different from those for offline as the information 
cannot be bought but only ‘borrowed’. A simple 
subscription scheme could be used; for instance, the 
Encyclopedia Britannica is available on the World 
Wide Web for a flat monthly fee (about $25). Other 
payment methods are pay-per-view or payment by 
quality or quantity of information downloaded; both 
would involve a metering system connecting the 
consumer to the publishing ‘host’. 

Although the Internet reaches an almost infinite 
audience, it will take considerable skill for publish- 
ers to market their products successfully: many 
‘surfers’ on the Internet choose to remain anony- 
mous; and dropping advertising and market 
information in appropriate bulletin boards (special- 
ist information forums) is frowned upon by the 
Internet community. The best strategy is for pub- 
lishers to market and make available their products 
through the World Wide Web. This increasingly 
familiar information service uses a friendly hypertext 
layout and organizes information into different sub- 
ject areas. It has just incorporated a commercial 
sector, in which there is a compartment called ‘Elec- 
tric Press’ where publishers can market their 
catalogues and demonstrate and sell their electronic 
products. 

Some information providers may prefer the less 
chaotic nature and greater bandwidth capabilities 
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Reference (as seen from the questionnaire results) is 
the biggest market and caters for the epic mass 
market titles such as Microsoft's Encarta, low price/ 
high penetration products such as maps or spell- 
checkers, and specialist niche titles such as Meckler's 
CD-ROMs in Print. The interactive and mixed-me- 
dia nature of electronic publishing is ideally suited to 
the education market at all levels. A good example of 
a primary school product is Nelson’s reading and 
writing scheme entitled Flying Boot. Entertainment 
and children will be the emerging general consumer 
markets which are likely to explode into mass con- 
sumerism shortly, and will promote a wealth of 
‘edutainment’, lifestyle and games-oriented titles. 
Business will provide a lucrative vertical market 
where information will be either real-time (commod- 
ity prices, currency movements) or longer-lasting 
(company information, business opinion). Finally, 
STM (science, technical and medical) print-based 
information has been growing exponentially in jour- 
nal form and is looking to electronic publishing to 
make its prodigious output more accessible. 

Lastly and very importantly, a special kind of 
human creativity and imagination will be needed to 
develop electronic products that non-computerate 
consumers can respond to. Many of today’s CD- 
ROMs try slavishly to follow the format of their 
book predecessors when instead they should be de- 
signed with no preconceptions and purely at ‘content’ 
value. 


What problems are outside the publishers’ 
control? 

All publishers, whether current or potential elec- 
tronic information providers, when questioned, 
believe that the greatest initial barrier facing elec- 
tronic publishing 1s the low penetration of computers 
into the home. General consumers have not been 
buying personal computers, firstly because the hard- 
ware is too expensive, and secondly because there 
are no compelling applications to use on them; in 
turn, publishers are reluctant to produce the compel- 
ling applications when the market has not yet 
installed the appropriate hardware. This vicious 
circle will be broken in either of two ways: a ‘killer’ 
application will be produced which is seen as a 
‘must-have’ product, and thereby kick-start the large- 
scale sale of personal computers into the home; or 
the market-place for multimedia PCs and electronic 
products will mature gradually. This has begun to 
happen in the last few months by the arrival of anew 
generation of more powerful, but competitively 
priced computers that have CD-ROM drives in- 
stalled as standard, and publishers, such as Dorling 
Kindersley, launching five new products on to the: 
market in readiness for Christmas. 

The rapid development rate in electronic technol- 
ogy is creating a good deal of uncertainty. Many 
publishers, unable to predict what will be happening ` 
in one or two years’ time, may make a huge invest- 
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lishers can try to protect their copyrighted informa- 
tion in two ways: by displaying clearly on the product 
the conditions of sale and explaining the illegality of 
copying information; or by encrypting the informa- 
tion which can then only be unlocked by the purchase 
of a decrypt password every month. The former ploy 
should stop the consumer who is simply unaware of 
the laws of copyright, while the latter will temporar- 
ily check the professional pirate. Sadly publishers 
may only cover the unlikely loss of revenue by 
increasing the price of their electronic products. 

The acquisition of copyright has become a com- 
plex issue. While a book may consist of 80,000 
words and a few pictures, a general consumer elec- 
tronic product, such as Dorling Kindersley’s 
Encyclopedia of Science, consists of 80,000 words, 
600 colour photographs and illustrations, 80 
animations and video sequences and two hours of 
audio. Electronic projects are built on a much larger 
scale; finding the necessary materials and clearance 
rights is an arduous and time-consuming occupation. 

Electronic publishing requires considerable fi- 
nancial investment: electronic projects are much 
bigger in scope than normal print-based products; 
consultants and specialists need to be hired to pro- 
mote new technical and marketing skills; and online 
document delivery requires round-the-clock mainte- 
nance of large-scale ‘host’ databases. From the 
questionnaire results it could be seen that only the 
larger publishing companies had entered electronic 
publishing, while many of the smaller companies 
cited the large financial investment as the main disin- 
centive to developing electronic products. 

For these reasons, publishers are wary of entering 
electronic publishing alone. They prefer to create 
alliances (mostly with software houses) to share the 
risks and the development costs. At present the pub- 
lisher develops the content and concept and the 
software house designs the interface and retrieval 
engine but in the future, programming the software 
may become an in-house operation and publishers 
will look for other benefits from their alliances: 
acquisition of multimedia material from TV and 
film/music companies; online product development 
with telecommunication companies; and ‘bundling’ 
electronic products with personal computers and 
video game consoles. 

Electronic publishing will force publishers to 
adopt new operational tactics and strategic direc- 
tions: ‘single-source, multimedia-delivery’ projects 
will require access to a wider skills base through all 
departments (editorial, production, marketing and 
sales); they will need to become ‘content rich’ and 
develop their own digital archives of multimedia 
material; they will have to effect a much closer 
relationship with consumers, offering after-sales serv- 
ice and reacting to vital feedback; and they will need 
to become much more sophisticated in order to iden- 
tify and segment the appropriate markets, be it general 
consumer or niche, for their electronic products, 


July/August 1995, Aslib Proceedings 


The future of electronic publishing for book publishers within Britain 


———————Ó e —————————————— M M M ——— PÀ—Á— M سسس‎ a üU 


will produce titles to be sold in conjunction with 
their portable hardware. TV companies will look 
to supply online information services in addition to 
their main channels. They will also create offline 
products (e.g. interactive cookery recipes!) culled 
from the consumers' favourite programmes. Like- 
wise, film/music companies could create CD-ROMs 
from their best films or artists. For example, Peter 
Gabriel has produced an interactive CD entitled 
21۳/0۳۵ 1 in association with his record company. 


Conclusion 

Bits and bytes, the units of digitization, can represent 
every data type: text, still image, animation, sound 
and video. The delivery of mixed-media information 
will become commonplace, and accordingly the dis- 
tinction between different media groups (books, 
newspapers, television, film and music) will become 
blurred. 

À potentially huge new electronic information 
market-place will be created. Many different indus- 
tries will compete to gain a toehold: book publishers, 
software houses, hardware manufacturers, TV com- 
panies, telecommunication companies, database 
providers, video games manufacturers, film and music 
companies and multimedia specialists. 

Traditional publishers will find that much of their 
material, once in book form, will be better served as 
an electronic product. Markets such as reference, 
education and business have already been exploited 
electronically. Soon, as the penetration of multime- 
dia PCs in the home becomes a reality, a whole host 
of lifestyle, entertainment and children's products 
will be made available to the mass market. Book 
publishers must act quickly or competitors will 
capture their niche in the market. They must seize 
the opportunity before it is too late. 

Publishers should welcome rather than fear elec- 
tronic publishing: it will create new markets, offer 
the capacity to diversify and give long-term cost 
benefits. The printed book will remain but its relative 
influence on publishers and consumers alike will 
recede. | 


NOTES - findings from the questionnaires’ results 


Creation of new markets was considered by both 
current electronic publishers and potential electronic 
publishers as the main commercial benefit of elec- 
tronic publishing. 


Interactivity/manipulation was considered by both 
current electronic publishers and potential electronic 
publishers as the main attraction to the consumers. 


Reference was seen by both current electronic pub- 
lishers and potential electronic publishers as the 
largest market for electronic publishing. 


Both current electronic publishers and potential elec- 
tronic publishers saw traditional publishers as likely 


` competitors in electronic publishing. Current elec- 
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ment in CD-ROM publishing, only to find it a 
transient technology; it is this uncertainty that will 
compel many publishers ‘to wait for the clouds to 
part’ and then see what is commercially viable. 

` Hardware standards need to be looked at closely. 
Publishers are given a choice of different platforms 
that are incompatible: PC, Apple and TV's CD-I. 
While the incompatibility between the TV and com- 
puter platforms probably will remain, PCs and Apple 
Mac should be made compatible. The much-her- 
alded PowerPC was meant to bring the two together, 
but has so far failed to do so. 


What risks do publishers face if they do not 
embrace electronic publishing? 

The digital age has arrived. The computer, TV and 
telephone have now converged to communicate in- 
formation in an electronic fashion. The world is 
entering a new information age, and the consumer is 
slowly but remorselessly beginning to come to terms 
with it. Today's younger generation is the first to 
accept the new digital revolution. Every following 
generation will demand their information products 
to be screen- rather than print-based. 

The traditional book publisher will have to em- 
brace electronic publishing to maintain his or her 
foothold in its old market niches and thus ward off 
the new competition. Although many publishers 
believe complacently that any competition will come 
solely from other traditional book publishers, other 
industries are eyeing electronic publishing covet- 
ously. Who are they? Software houses are anxious 
to diversify and see electronic publishing as a prof- 
itable business venture. Microsoft is the leading 
publishing software house: they have bought elec- 
tronic rights to and digitized all the works of art 
within the art galleries of the USA; and have al- 
ready produced the CDs Musical Instruments and 
Encarta, and are about to publish two new prod- 
ucts entitled Ancient Lands and Dangerous 
Creatures. Likewise multimedia specialists, such 
as Andromeda Interactive and Attica Cybernetics, 
have packaged CD-ROMs for book publishers. 
They are now are starting to publish the products 
themselves. For online publishing, telecommuni- 
cations companies will be powerful competitors. 
BT is developing an ‘interactive service’ screen 
for TV, which will provide films on-demand, tele- 
shopping, education and training programmes, 
information services and video games. Mixed-me- 
dia conglomerates, such as Pearson or the Murdoch 
empire, will have a powerful influence on the 
market-place. They have the advantage of possess- 
ing strong financial backing, huge cross-media 
content and control of worldwide markets. Hard- 
ware manufacturers will want to create electronic 
products to be ‘bundled’ with and thus increase 
sales of their new multimedia hardware. For exam- 
ple, Sony has sponsored over 300 titles for its Data 
Discman. Similarly, video games manufacturers 
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Regardless of how long a company has been involved 
in electronic publishing, greater penetration of com- 
puters into the home, standardization of software and 
hardware, and cheaper hardware and software were 
seen as the most important factors for consumer 
acceptability. 


Of the potential electronic publishers and non-elec- 
tronic publishers only 8 out of 23 had an annual 
turnover over £5 million. However of the potential 
electronic publishers 14 out of 18 had an annual 
turnover above £1 million. 


Of those with a turnover above £5 million, the most 
likely partner in a Joint venture would be a software 
house. Those with a turnover below £5 million would 
just as likely choose a traditional publisher as a 
software house. 


Only those with a turnover above £5 million saw TV 
and telecommunication companies as possible 
competitors. 


Only those who have been involved with electronic 
publishing for more than 2 years saw telecommuni- 
cation companies as possible competitors. 


Potential electronic publishers with an annual turno- 
ver above £1 million cited familiarity with print and 
human reluctance to change as the main reasons for 
the market being slow to accept electronic publish- 
ing. Those with an annual turnover below £1 million 
cited the set up cost of hardware. 


Small turnover companies (« £1 million) only dis- 
tributed their electronic products through mail order 


and direct marketing. Large turnover companies (> 


£5 million) used many different distribution chan- 
nels but were the only ones to go also through 
consumer electronics retailers. 


When broken down into subject clusters, children/ 
education and children/entertainment products were 
distributed through consumer electronic retailers, 
while STM, reference and business were mainly 
distributed through bookshops. 


For potential electronic publishers, both those who 
will launch their products into the market ‘sooner’ (< 
1 year) and ‘later’ (> 1 year) stated that the uncertainty 
of the marketplace and the large financial investment 
required were the major risks involved in entering 
electronicpublishing. ‘Sooner’ )> 1 year) placed more 
importance on copyright problems, while ‘later’ (> 1 
year) cited the rapid development rate in electronic 
technology. ‘Later’ also stressed importance of the 
minimal marketplace for electronic products and un- 
familiarity with the technology. 


For the risks for not getting involved in electronic 
publishing, competition to capture the market share 
was a clear favourite with both ‘sooner’ and ‘later’. 
‘Sooner’ stressed the importance of acceptance of 
new technologies among the younger generation. 
‘Later’ was more concerned with lower library sub- 
scriptions. 
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tronic publishers also strongly identified software 
houses and to a lesser extent telecommunication 
companies as competitors, while potential electronic 
publishers cited multinational corporations. Some- 
what surprisingly film/music companies, TV 
companies, and video games manufacturers were not 
seen as a threat. 


As possible partners in joint ventures both current 
electronic publishers and potential electronic publish- 
ers favoured software houses and other traditional 
publishers, Telecommunication companies, although 
seen as competitors by current electronic publishers 
were however not seen as possible partners. Film/ 
music companies received no votes at all. 


Current electronic publishers had a huge preference for 
offline (CD-ROM) delivery. Potential electronic pub- 
lishers also strongly favoured offline delivery but were 
willing to combine both offline and online delivery. 


The unanimous choice for hardware platform was 
the PC. The TV and handheld electronic book were 
ignored. 

Input into the hardware platform was by keyboard 
and mouse only. 

For both current electronic publishers and potential 


electronic publishers the factors for gaining consumer 
acceptability were much cheaper hardware and soft- 


ware, and greater penetration of computers in the " 


home. Potential electronic publishers cited strongly 
the standardization of software and hardware. 


. For potential electronic publishers the main risk in 
not entering electronic publishing was seen as com- 
petitors capturing their market share. 


For both potential electronic publishers and non- 
electronic publishers the main risk for entering 
electronic publishing was seen as the need for a large 
financial investment. ۱ 


For both potential electronic publishers and non- 
electronic publishers the main reason why the market 
is slow to accept electronic publishing was seen as 
the consumer's familiarity with print. 


For non-electronic publishers the main problem fac- 
ing traditional publishing was the flood of books on 
the market. 


A large financial investment is required for elec- 
tronic publishing. Of 21 current electronic publishers, 
16 had an annual turnover in excess of £5 million. 


Different consumer markets demand certain media 
in their products. Entertainment, travel, and children 
demand both video and sound, while law and librar- 
les are mostly text-based. 


All markets see interactivity/manipulation as a major 
attraction to the consumer. Education and entertain- 
ment cite multimedia strongly. Both speed and 
accessibility were seen as main attractions of legal 
products, but were considered inconsequential for 
travel and entertainment. 
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For reasons why the market is slow to accept elec- 
tronic publishing, ‘sooner’ and ‘later’ both plump 
for familiarity with print and human reluctance to 
change. 


For factors needed to gain overall consumer accept- 
ability of electronic publishing, ‘sooner’ and ‘later’ 
both saw much cheaper hardware and software, stand- 
ardization of hardware and software and greater 
penetration of computers in the home as the most 
important. ‘Sooner’ cited much cheaper hardware 
and software as very important. 


All subject clusters strongly went for creation of 
new markets as being the main commercial benefit 
to the publisher. Both children/entertainment .and 
children/education cited diversification strongly. 


For factors to gain overall consumer acceptability, 
subject clusters such as children/entertainment and 
children/education scored high marks for higher 
penetration of computers into the home and cheaper 
software and hardware. The STM/reference/ 
education cluster scored very low for higher pen- 
etration of computers into the home. Business/ 
reference scored very low for the standardization 
of hardware and software and better quality screen 
display. 
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Abstract 
An industrial research laboratory has to take a long view of technology. There is increasing pressure to make a 
business case for research projects. There are plenty of data and analysis of the past. There are a great many 
sources of news about the present. There is no shortage of individuals and agencies willing to predict the future. 
What does the researcher make of these? How do researchers build their own pictures to complement their 
technical work? The aim of this paper is to provide an insight into-the information needs to support technology 
forecasting. The paper discusses the research environment. It focuses on short-term requests which put the most 
pressure on information services. It considers the information needs; the available resources; and the reliability 
of forecasts. The emphasis is on the analysis of published market information to support researchers in their 


The introduction of computers accelerated the 
development of forecasting, making it more viable to 
maintain a forecast. Techniques range from the sub- 
jective ‘jury of executive opinion’ to mathematical 
techniques like exponential smoothing and trend- 
line analysis. 

The timescale over which the forecast 1s made 
varies according to the need being served. For ex- 
ample, a production facility has a short-term need 
to forecast the demand for each current product. 
Over a longer time frame, it needs to look at catego- 
ries of products, employment levels and costs. 
Looking further ahead, it needs to consider the 
demands for machinery, new technology and possi- 
ble plant expansion. R&D departments’ needs are 
more long term still. They are involved in new 
product introduction and the future demand for 
products which are yet to be developed. The accu- 
racy which is attainable — and indeed desirable — is 
related to the term of the forecast and the maturity 
of the technology. 


How reliable are forecasts? 

History is littered with the debris of forecasts which 
went awry. The most careful forecast can be up- 
staged by unforeseen events. Advances in technology 
may outstrip the predictions of 5 or 10 years ago. 
Figure 1 shows a prediction which was made in 1974 
of the relationship between price and performance of 
computers. Plotted against this are the actual price/ 
performance figures for two machines which beat 
the forecast trend by several years. Instead of just 
improving the price/performance of large machines, 
large-scale integrated circuits enabled small, less 
expensive computers to match the performance of 
their larger cousins. 


technology goals. 


Forecasting as applied to business is a relatively new 
discipline. Pioneered in the 1950s, it has developed 
apace with the introduction of computers. There are 
many forecasting methods, ranging from the subjec- 
tive to the highly mathematical. The potential 
applications within a single company are legion. 
This paper looks at the information needs to support 
forecasting in a technology research laboratory. Short- 
term requests for market figures can put a heavy 
demand on a lab's information services. The tech- 
nology area may already be familiar, and the sources 
well understood. But the request itselfis often unique 
and unpredictable. It may be submitted at short no- 
tice, to back up a meeting or a business trip. This 
scenario challenges information professionals to come 
up with data, and to analyse it under pressure. 

The paper focuses on this particular need, in 
order to illustrate the use of information resources. 
An overview of forecasting 1s followed by a consid- 
eration of the information needs within a research 
laboratory. The paper concludes with a brief discus- 
sion of some of the information sources which are 
used to support forecasting. 


Forecasting in industry 

From its introduction to business in the 1950s, fore- 
casting has been taken on board by companies eager 
to work out which products they should develop, and 
to determine the market for their goods. In product 
marketing, decisions take into account forecasts of 
market size and the company's potential share. In 
production, forecasting is used to plan materials 
purchase, workforce planning and equipment pur- 
chases. Finance departments use forecasts to keep 
track of cash flow and expenditure. These are just 
some examples of the applications of forecasting. 
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Figure 1: 1974 Prediction of computer price/performance in the 1980s 
From David A. Patterson and John L. Hennessy, Computer organization and design: the hardware/software interface! 


have invested; researchers' funding is affected by the 
perceived usefulness of and public interest in their 
technology; market researchers gain publicity and 
customers by making bold predictions. 

It is vital to keep a record of the sources them- 
selves. The country of origin and the date alone are 
useful clues to interpretation. The same organization 
may change its forecast dramatically in just half a 
year. This is particularly true of emerging technolo- 
gies, where there is much speculation, and where a 
breakthrough in technology or a change in legisla- 
tion could have a big impact. 

Finally, all forecasts must be brought back to 
reality. It is tempting to extrapolate a growing trend. 
But this must always be related to more concrete 
facts. What does it mean in terms of the population, 
or in terms of what people earn? How does the 
forecast relate to the money that companies are actu- 
ally investing? What benefit does the new technology 
offer over what people already have? 

Forecasts for emerging technology are particu- 
larly speculative. The information analyst must be 
prepared to question the material, and to revise his or 
her picture of the market. 


The research environment 

A research laboratory is looking ahead to the future 
of the company. It develops technology which will 
be used in new products. Researchers are primarily 
concerned with the technology itself. In addition, 
they have a need to back up their research with a 
business case. At any one time, an industrial labora- 
tory will have a variety of projects at different stages 
of development. These range from speculative ex- 
plorations, the results of which may not be of use to 
the company for another 5-10 years, through to 
projects which are close to or actually transferring 


"technology to a product division. The needs for 


market information vary from one stage to another. 
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More commonly, predictions are made for a tech- 
nology which is emerging or even ready for use. But in 
the event it may either never make it into the market, 
or its entry may be delayed for many years. In the early 
1980s forecasts for home information services were 
predicting that up to 5% of US households would have 
videotex by 1985. All the enabling technology was 
available. One company spent $60 million setting up 
a service which never made money. For most people 
in the home, it was simpler to use newspapers and 
printed catalogues and encyclopedias. 

A change in fashion can turn a market on its head. 
A Financial Times? article exploring the pitfalls of 
the diamond market cites the value of coloured dia- 
monds. Ten years ago, these were worth less than 
colourless ones. Today, a blue 15-carat stone will 
fetch more than $300,000 a carat; the same stone in 
pink can fetch $400,000. Compare this with $50,000 
for a colourless stone. This trend was triggered in 
1987 by the sale at auction of a red diamond for 
$920,000 a carat. 

` The job of the forecaster is to look at today's 
predictions; to interpret them; and to consider what 
may happen in the future to affect them. This re- 
quires looking beyond the published numbers, to 
broader issues like legislation, demographics and 
economics. 

Over time, an information specialist gets to know 
his or her way around the published sources. If two 
market research agencies have different views of the 
past and present, an investigation of this will help to 
evaluate and interpret their view of the future. The 
differences could be due to many things: the geo- 
graphical base ofthe respective agencies can introduce 
a bias; use of terminology may vary, especially with 
emerging technologies; underlying assumptions fre- 
quently vary, e.g. the life ofa product in the customer's 
hands. Vested interests skew forecasts. Companies 
want to make money out of technology in which they 
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on existing forecasts. However, a research la»ora- 
tory is concerned with the future. For any technology 
which is under development, or which the company 
is considering for a research activity, there must be 
an account of how this could make money fcr the 
company. This means looking at the market for 
products and services which could use the technol- 
ogy, or which could be enabled through the new 
technology. The types of information which aid this 
study include: 

published forecasts 

expert opinions 

forthcoming legislation 

demographics 

patents 

economics. 

The requirement may be to identify new markets 
to be enabled by the technology; it may be to look at 
trends in existing products, to which the company 
intends to add value. An example of this may be to 
forecast the population of computers for which the 
company plans to develop a software applicat:on. 

The questions are: how big will the market be? 
How much can we target for ourselves? What influ- 
ences are there which might affect this forecast? 


What information sources are available? 
There are plenty of consultants and organizations 
ready to predict the future and to provide the latest 
news. À few examples are described below. 


Market research reports 

Published market research is a major source of infor- 
mation. It predicts trends over time. The reports 
often contain historical data as well as future p-ojec- 
tions. They also have an explanation of their 
techniques, of their own sources, and of how they 
view the market. These are usually the most detailed 
documents which deal with the future. They arz also 
among the most expensive. They may have been 
commissioned or purchased by the company. For 
any study, the more views you can get, the better. If 
there are several forecasts for the same marke:, you 
can lcok for consensus, or for obvious differences. If 
there is only one forecast, there may be a reason why 
no-one else is sticking his or her neck out. The more 
mature the market, the larger the number of forecasts. 


Online databases 
Online searches can look for keywords, ofter. with 
Boolean logic for more sophisticated searches. These 
can search for news articles; for information about 
companies; and for titles and abstracts of papers. 
Intellectual property databases can be searched for 
patents. Besides specific searches, alerts can be set up 
on some of these databases, to inform the researcher of 
— for example — new developments in technology. 
Some agencies offer news feeds. These can be 
ready-filtered, or they can be searched using key- 
words. | -~ ا‎ e 


2 ^ 
“¢ 
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All projects have one thing in common: the need to 
project the demand for the technology they are in- 
volved in. All are concerned with future products and 
growth for the company. Even a project in the most 
advanced stages of technology transfer has to consider 
the market needs several years into the future. 

To support these needs, an information service 
can track sources of information pertinent to the 
technologies under development. It must also re- 
spond to specific requests. A request may be 
technology-based. It may be product-based. The need 
for information of these sorts is often immediate. The 
task in this case is that of collecting information, 
assimilating it and presenting it in an appropriate 
form. The information worker acts as information 
gatherer and consultant to a project. 


What information is required? 

The business needs of a research project vary accord- 
ing to its maturity. An early speculative project will 
want to generate a broad-brush but compelling pic- 
ture of the need for new technology. The information 
needs at this stage could include demographic or 
customer information; or current market figures and 
projections for a set of products which the new 
technology aims to improve or replace. A more 
mature project may be making the case for a new 
business for the company; or it may be working with 
a manufacturing division on the application of new 
technology into new products. At this stage, a re- 
search project is working more closely with product 
developers. The researchers and their partners have a 
firmer idea of what products and services the com- 
pany could produce. 

Past, present and future all hold important clues. 
Historical data can show the rate of adoption of 
comparable technology in the past. Awareness of 
perennial problems, such as how to extend the cus- 
tomer base beyond early technology adopters to the 
wider market, comes from looking at past cases and 
experience. For example, Geoffrey Moore’s Cross- 
ing the chasm’ identifies the ‘Technology Adoption 
Life Cycle’, and uses examples to illustrate high- 
tech companies’ success or failure in acquiring 
mainstream customers. The new technology may, 
for instance, be targeting a particular geographical 
region. For example, providers of new technology 
for telecommunications may consider the history of 
adoption of existing technology in North America, in 
order to understand the demand for new networks in 
developing countries. 

A picture of the current market is essential. Which 
companies are involved? Who are the current and 
future customers, competitors and partners? What 
are they doing, and where? What are their plans? An 
understanding of today’s market helps to focus the 
future forecast, and to work out where the company 
can make money. 

Data on the past and present provide a useful 
information base. They also serve as a reality check 
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The information analyst needs to have expo- 
sure to as many sources cf data as possible. Then 
he or she can discriminate. A couple of months of 
browsing reveals which newsletters provide the 
most comprehensive coverage; who gets the news 
first; where to find data in the most usable format; 
and which sources are most often quoted by others. 


Conclusion 
This paper has concentrated on the use of informa- 
tion services to support technology forecasting. 
Forecasting for research means dealing with a 
large factor of uncertainty. The forecaster is 
looking several years ahead. The information spe- 
cialist supporting this activity must get to know 
their sources of information, and must be prepared 
to investigate the assumptions and bias underlying 
them. The important thing 1s to look for the big 
picture, rather than for precise numbers. Forecasts 
from market research companies provide views 
into the future. Past and current data can 
complement this to provide context and a reality 
check. 
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Periodicals 

Newsletters and journals abound. Some of these are 
free. Most are not. Regular publications such as 
these can offer updates on the subjects covered by 
market research reports. 


Electronic sources 

Electronic sources are particularly useful. Depend- 
ing on the licence agreement, these can cut down on 
the effort of converting numbers into a usable for- 
mat, such as tables or spreadsheets. Some journals 
are now available in electronic form. Information 
Week and the Telecomeuropa are examples of this. 

An electronic table of contents service is useful 
for journals to which an organization does not want 
a full subscription, but which occasionally have 
articles of interest to the researchers. This service is 
particularly useful for highly specialized or periph- 
eral research areas. 

Electronic bulletin boards are becoming available. 
These are not always easy to find out about. But a good 
one can provide a ready filter for any particular area 
of technology. Many companies, government depart- 
ments and other bodies which collect information 
have started to put pages on the World Wide Web. 


Other sources 

Main trends are often published less formally. For 
example, the big numbers from a new report may 
appear in the press. There will not be any detail. But 
another view is provided. Similarly, companies some- 
times offer their views of the future, or of the current 
market, either in a specialist publication, or in their 
own annual reports. 
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end-user interface to be used for the product. 
For some, this is an easy decision — all the 
SilverPlatter titles for example are produced so 
that they can run under the one interface WinSPIRS 
— Windows SilverPlatter Information Retrieval Soft- 
ware. Other, perhaps rather newer publishers may 
well decide to buy in an interface produced by 
another organization. 

However this is achieved, the end product is 
made available to run under a specific end-user 
interface. The search engine will generally have 
very specific code to make it work with that inter- 
face — the two are completely intertwined and cannot 
be seperated. As a consequence, you end up with a 
situation where even a small library or information 
centre will have perhaps half a dozen different end- 
user interfaces for the different products it subscribes 
to. For those of you in that situation, you already 
know that it is not a nice situation to be in. Your user 
base may not need to know all the interfaces, but 
you as the professional must of course know them in 
depth, so you can provide help and assistance across 
all of the products you subscribe to. 

Would it not be so much easier if you could run 
all of your different products under the one inter- 
face? So that you could purchase or subscribe to any 
product you wished to and have it immediately 
running under the same interface that all of the rest 
of your products run under? That, in essence, is what 
a common user interface 1s — one piece of software 
that you can use to search all your databases, regard- 
less of who you purchased them from, or what they 
cover, or where they are to be found — on a local CD- 
ROM, one held either optically or magnetically on a 
network, or even across the Internet. 


The advantages of a common user interface. 

1. It makes life a lot easier. 

It makes life easier for end-users, who only have to 
learn the interface once. Consequently, they can 
spend the majority of the time they are learning on 
how to search databases more effectively to retrieve 
the data that they want. Instead of worrying about ‘is 
it F2 or Cntl F to download’ they can delve into the 
much more delightful intricacies of the advantage of 
a field specific search as opposed to a proximity 
search. 


1995. ۱ 


‘Then there entered into the hall the Holy 
Grail covered with white samite, but there 
was none might see it, nor who bare it. And 
there was all the hall fulfilled with good 
odours, and every knight had such meats and 
drinks as he best loved in this world. And 
when the Holy Grail had been bourne through 
the hall, then the holy vessel departed sud- 
denly, that they wist not where it became; then 
had they all breath to speak’ 

The above is, of course, a quote from Le Morte 
D'Arthur (book 13, chapter 7) and is of course refer- 
ring to the Arthurian concept of something which 
was vitally important, but could never quite be lo- 
cated, except for the one or two who were pure of 
heart. Look at any industry and you'll find examples 
of their Holy Grails: the car which runs on water, the 
lightbulb that never dies, or indeed the version of 
software which is released with no bugs in it. 

For the information industry, and those of us 
who search across online databases, CD-ROM 
databases, library catalogues, the Internet etc, our 
Holy Grail has got to be that of the common user 
interface. This paper will look at the following 
aspects: 


What is a common user interface? 
What is the history behind it? 


What are the advantages and disadvantages of 
having one? 

How close is the Holy Grail of the common user 
interface in being realized? 


Once or if we have it, what differences will it 
make to our lives, and that of the industry within 
which we work? 


This paper is not going to get into some of the 
painful technical discussions that you quite often 
find when you open the lid of issues such as this. 
However, I will be making use of various common 
terms and phrases, but any which I think may not. be 
widely understood, I shall attempt to explain as and 
when appropriate. 


What is a common user interface? 

This is the first term that we really need to define as 
clearly as possible. Currently, each individual 
CD-ROM publisher has to make a choice as to the 
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librarian offers assistance, and has an understanding 
of one interface. 
So, in every way possible, life is made easier. 


2. Hardware 

A CUI will have a specific set of hardware require- 
ments, in terms of hard disk space, memory and so 
on. Of course, in some ceses, that is going to be a 
nuisance, especially if the minimum hardware re- 
quirement is greater than that which you have for 
some of your machines. However, this section of the 
paper is looking at the advantages, so P'1l come to all 
that later. For the moment, I'1l be optimistic and say 
that it means you are able to plan new purchases 
with more confidence, thzt you'll be able to install 
the software easier and quicker, that there will be no 
conflicts with other retrieval softwares, and if you 
encounter problems of one sort or another, it is going 
to be a lot easier to focus on, or eliminate from your 
investigation, the software as a potential cause of the 
problems. 


The disadvantages of a common user interface 
Ihave yetto discover something which does not have 
as many disadvantages as it does advantages. And 
even a Holy Grail comes with its own problems. It 
certainly did as far as Kinz Arthur was concerned — 
his entire court was disbanded as one by one his 
knights went off to look for it. So in this section, I 
have looked at the problems which may be caused by 
having a common user interface. 


1. The lowest common denominator 

۸ huge number of CD-ROM products currently exist 
in the marketplace — TFPL has estimated that by the 
end of this year there may well be about 9,000 of 
them, including general titles and games titles. How 
can any one single interface deal with this huge 
multiplicity? Just take two products — one a text- 
based product such as Medline, and another, which is 
geared rather more towards multimedia say, with an 
abundance of graphic images, sound and perhaps 
some moving video images as well. The answer of 
course is that it either doesn't, or it works at the 
lowest common denominator. The user is therefore 
constrained in what he or she can do, not because of 
technology, but because the interface does not do 
what it could do. Yes of course, it would be possible 
to have different *modes' for example, but then that 
rather defeats the purpose ofthe exercise. And on the 
other hand, if you have a common user interface 
which deals with all the possible things that could be 
done with a product, and it runs, say, under a Win- 
dows environment, you will end up with so many 
buttons, dialogue boxes or pull-down menus that it 
will make piloting an airczaft look like child's play. 
If you then decide to take tne tack that some products 
can, and will work under a common user interface, 


. but not others, then you've really not moved that far 


forward. 
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It will also mean that end-users are able to carry 
this information with them when they go from place 
to place. At the moment, people who search a 
database, and I am speaking specifically of the data, 
rather than the interface, are a little like taxi drivers. 
They know the data, and they know how to get what 
they want quickly and effectively, using the particu- 
lar tools at hand. The taxi driver similiarly knows 
how to get from the railway station to the local hotel 
quickly and easily, and knows how to make the best 
and most effective use of the tools at hand. How- 
ever, take a London taxi driver and place him or her 
in central Liverpool and ask to go from place A to 
place B, and he or she will be totally lost. To all 
intents and purposes, he or she is illiterate. The tools 
may be the same cab, for example, and the data may 
essentially be the same (hotels, railway stations and 
so on), but he or she finds 1t much more difficult to 
do the job. Similiar, the experienced professional 
may well be a skilled searcher of Medline, know all 
the short cuts to use to obtain the information re- 
quired, but when taken to another interface (or 
another city, using the previous example) produc- 
tivity and knowledge take an immediate nosedive. 
With a common user interface, this will not neces- 
sarily happen — moving from one employer to 
another, which previously may have meant getting 
used to a different search and retrieval interface, 
will simply not happen — you’ll be able to sit down 
in front of the new computer and simply continue 
searching using the interface you have been using 
all along. 

A common user interface is also going to make 
life a lot easier for those of us who are involved in 
training people. Picture the scene, which for many 
of you will probably not be that difficult, of a library 
with a dozen, or two or three dozen different 
databases networked. Some of these may well have 
the same interface, but they will be in a minority. 
Users need to be guided to the most appropriate 
databases for their needs, and then taught how to use 
them. And if that means that the databases are 
produced by different CD-ROM vendors, it means 
showing them how to use the different interfaces. 
Each workstation needs to have its own little, or not 
so little, collection of cheat sheets, workbooks, user 
manuals and so on. Now, it’s bad enough for end- 
user training, it’s even worse for the library staff. 
They have to be proficient in all the different inter- 
faces if they are able to offer quick and helpful 
advice when asked questions by the novice. This 
means in turn that they need to spend time either 
being trained, or using the softwares to ensure that 
they keep up to date on them. They need to be kept 
informed when new releases of software come out, 
differences between old and new and so on. This is 
a tremendously wasteful task, but is unfortunately a 
very necessary one. A common user interface obvi- 
ates the need for all of this: the end-user learns one 
interface, the trainer trains one interface, and the 
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Towards a common user interface 


ا تم سی سمش سام لٹ ٹس .سس ٹیس i te‏ 


they have had in the past. In fact, there will be more 
choice than ever before for the the librarian and the 
end user. How is this going to work? By accepting 
a particular standard, programmers will be able to 
write different interfaces for different requirements. 
No longer will one interface have to meet all of the 
needs for all of the users. You will be able to 
choose from say, a medical interface designed for 
doctors, to a pharmaceutical one, or an end-user 
interface specifically designed to meet the needs of 
electronics experts. They will all be quite differ- 
ent,.so in that respect they are not going to be a 
‘CUI’, but the area in which they will be exactly 
the same is the way that they communicate with the 
server, via the standard protocol. Server technolo- 
gies will also proliferate, and will allow librarians 
to choose server platforms to match their opera- 
tional budgets. 

Librarians will also be in a better position to 
decide how they give their users access to data, in 
the most cost effective and useful manner; be that 
data held locally, shared within a consortium, or 
accessed remotely across the Internet. With C/S the 
functionality will be exactly the same, and users 
will see no difference; they will not know, nor will 
they need to know if the data that they are pulling up 
onto their machines is from a database two floors 
away, or two continents away. 


Current developments 

1. 239.50 

Of all of the proposed standards, the one which has 
received most publicity within the industry is 239.50. 
It is an application layer protocol which allows 
developers to create a variety of distributed infor- 
mation retrieval applications; in other words, to 
access a variety of different databases and types of 
data using common commands, using the client 
server model. 

Now, Z39.50, in the words of Lorcan Dempsey 
*defines the mechanics of interaction, an apparatus 
for the expression of the syntax and semantics of 
queries and formats for the exchange of data’. This 
means that the client software and the server soft- 
ware can be produced separately; it's just the output 
of each which is converted into the protocol formats 
for transmission. Now, that transmission may take 
place within the same machine, within the same 
local or wide area network or, quite literally be- 
tween continents. 


Current and proposed implementations of Z39.50 
I am grateful to the article written on the signifi- 
cance of 739.50 by Lorcan Dempsey for details on 
who is doing what with the protocol. He has identi- 
fied 7 different implementors as follows: 


IRIS — an Irish current awareness service. The user ۰ 
will be able to search six different library OPACs 
and request information from them. 
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2. Stifling development 

Developers develop. What they develop, and how 
they develop it depends very much in part on the 
feedback that they get from their paying public, but 
also from the ‘What if? factor. ‘What if we tried this, 
or that, or the other?’ So they experiment and add 
new features and fiddle and play around until they 
have something new. But what if this new thing is 
really quite radical, and has not been considered 
before? In fact, things don't even need to be that 
radical to alter totally the way in which a piece of 
software runs — it could be as simple as someone at 
Microsoft deciding that feature *x' would be good to 
add into Windows '95 for example. Currently, this 
is very easy for developers to implement. They just 
do it, and make it available in a new release of 
software. 

Moreover, this then kicks all other publishers 
into doing exactly the same kind of thing — you've 
seen it all yourselves. Once one publisher puts out a 
thesaurus, all the others do, once one creates a Win- 
dows interface, all the others do as well. People 
thrive on competition, and working at the cutting 
edge of something. To what extent is that curiosity 
and developmental flair going to be limited by the 
common user interface? 


Client server architecture 

The basic component of a common user interface is 
something which is called client server architecture?. 
So, before we go off on our own quest for the Grail, 
let's get the concept of client server architecture 
clear in our minds. 

Client server is a method of seperating two indi- 
vidual functions — the client software requests 
information or services, and the server makes that 
information or those services available. The way in 
which the conversation between client and server 
works is defined by a number of application protocols. 
Many Internet applications such as gophers will use 
this approach. 

What are the advantages of client server? In 
many ways, exactly the same as the advantages listed 
above for a common user interface, but there are 
some further points which need to be addressed. C/S 
allows much greater balance in a system. With some- 
thing such as Telnet (an Internet utility) much of the 
work is done at the host machine, while in a LAN- 
based system the desktop pc ends up doing a dispro- 
portionate amount of work. In a balanced C/S system 
you can ensure that only the information which 
needs to travel across the network, or from server to 
pc, actually does so. A good C/S system has low 
costs and high speed, and will outperform, in almost 
any way you can think of, more traditional methods 
of obtaining data. 

Now, there is no particular reason why there has 
to be one single client, and one single server; the 
advantage of using a common protocol such as 239.50 
is going to allow users much more flexibility than 
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nology providers, distributors and information pro- 
fessionals. Technology components can be combined 
cheaply and effectively, whereas before it would 
have been regarded as too dangerous or costly. New 
solutions can be found, and new dimensions will be 
opened up: 

Users will be able to learn one interface and then 
get on with the much more serious business of actu- 
ally locating the data that they require, more quickly 
and effectively than ever before. 

Librarians and other information professionals 
will have to spend less time training and providing 
first level support to their users and will be able to 
concentrate on providing data in more effective 
ways. Instead of having to make a decision on 
subscribing to a product because of its search and 
retrieval interface they will be able to look solely at 
the actual data themselves, and make a decision 
based on them, instead. 

The information industry will flourish; already 


there are thousands of databases available; this 


number will only increase. À common user interface 
will provide an incentive for other information pro- 
viders to make their data available; yet other 
companies will be able to provide a variety of end- 
user interfaces designed for specific markets. A 
common user interface will also be able to search 
across the Internet and make that huge wealth of data 
much more available than it currently is. 

In short, by the turn of the century we will all be 
able to drink from the Holy Grail simply and easily. 


Note: The above reflects the views of the author, and 
not necessarily those of his employer. 
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Nordic SR-Net — Norway, Sweden, Finland and 
Denmark are collaborating to link union catalogues 
together. 


Project ION — This is designed to link national 
interlending systems in the UK, Holland and France. 


DVB-OSI — a proposal to link a number of different 
German regional systems, to offer users transparent 
'access to their services. 


Socker —an implementation of SR (a subset of Z39.50) 
between IME, UNI-C (a Danish academic organiz- 
ation) and FEK, a library computing organization. 


Europagate — a collaboration between Danish, Irish 
and Portuguese partners to implement a gateway 
between SR and Z39.50. 


The British Library is also implementing 0 
to its OPAC service. 


Other potential protocols 
There are a number of other potential protocols 
which are currently being discussed within the in- 


dustry. Due to space constraints these will not be 


discussed in this paper. 


How close are we to achieving our Holy Grail? 
As can be seen from the examples mentioned above 
with respect to Z39.50, a wide variety of organiza- 
tions are taking the concept of a common user 
interface seriously; both they and their customers 
can see the undoubted benefits of such a protocol. 
SilverPlatter has also made a number of inroads into 
this, and has made arrangements with AmeriTech to 
allow crossdatabase searching, for example. 


How is a CUI going to change our industry? 

What will the end result of this be? The industry is 
going to be able to move away from traditional 
means of providing data, in which the data, the 
server and the end-user interface are all inextricably 
linked, into a situation in which greater co-opera- 
tion is going to take place. There is going to be much 
greater synergy between information providers, tech- 
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Julian F L Stubbs 


BT National Business Communications, 2-12 Gresham Street, London EC2V 7AG 


Abstract 


The information revolution which we are approaching, the IT Information Revolution, is only the last of 4 major 
information revolutions which have punctuated man's own evolution. The upcoming IT revolution will make 
different demands on libraries; will offer them different opportunities. If these are not grasped then the eventual 
future for public libraries may be limited. Technology per se is not, in many senses, an important issue here — it 
is the way that technology is, or will be, utilized that will be the key. There will be a need to re-draw operational 
models of libraries to reflect this revolution and its attendant information superstructures — and this will be far 
harder than getting access to, or learning to live with, the technology. 


be argued that this has led both to mass education, 
and through the ready sharing of knowledge amongst 
many people, to the increasingly. rapid series of 
technological and scientific developments which we 
have seen since the Enlightenment. 


Information Technology. The last revolution (if one 
views records, tapes and videos, broadcast radio and 
TV as logical developments of printing, though in 
some instances more ephemeral) is that of Informa- 
tion Technology, and it is a revolution because it 
allows, for the first time, direct mediation of infor- 
mation and interaction with it. The IT information 
revolution has been stimulated by (and even necessi- 
tated by) the end results of the printing revolution, 
which have not only led to vast stores of information 
but have encouraged its continuing creation and 
updating, until the amounts of relevant information 
have snowballed to such an extent that they have 
become difficult to access or use effectively. 


What differentiates the information technology 
information revolution? 

What, then, are the characteristics which mark the 
Information Technology effect? 


Computing and telecommunications convergence. 
One view is that it in some way describes the conver- 
gence of telecommunications and computing, and 
before too long arguments begin to revolve around a 
whole series of exciting acronyms and buzzwords: 
fibre, broadband, ADSL'; T-PON?, ISDN and B- 
ISDN’, SMDS", digitization, compression (MPEGI 
& 2%, client/server architecture, intelligent agents, 
multimedia, Internet and so on. These are the techni- 
cal building blocks which allow the IT information 
revolution to be realized, but they are no more impor- 
tant than that. Almost all the technological develop- 
ments needed for the IT information revolution are 


The information revolutions 
It can be argued that mankind has seen 4 major 
information revolutions. 


Speech. The first, the development of speech, allowed 
man to communicate information to his fellows, and, 
eventually, to store that information for future use 
through folklore and oral tradition. Speech allowed 
man to stop living for the moment, and gave him a 
framework through which to develop such abstract 
ideas as religion. Speech was the medium through 
which man first began to create and use information. 


Writing. The next big step was the development of 
writing, which allowed the storage of information in 
a semi-permanent form not so easily capable of the 
distortions of an oral tradition. It also allowed im- 
mensely boring, but no doubt useful, information to 
be readily stored and accessed, such as detailed 
inventories — and it is no surprise that the eventual 
translation of Linear B, and much of the content of 
cuneiform writings, should turn out to be lists of 
things. It is arguable that it is writing (and its associ- 
ated skill ofthe recording of, and eventual calculation 
with, numbers, that allowed the creation of (or at the 
least supported the swifter movement towards) de- 
veloped economies, moving away from subsistence 
to wealth creating societies. 


Printing. Reading and writing, if the developed civi- 
lizations of Greece and Rome are put to one side, 
were primarily the remit of a scholarly, and fre- 
quently priestly, class, and it 1s not until the next 
major revolution, that of printing, that the possibili- 
ties of universal access to stored information became 
practicable within numerically large societies. Print- 
ing not only led to more copies of existing works 
becoming available, but changed the economies of 
creating new material for dissemination, and it can 
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change which is not yet, in most cases, being effec- 
tively delivered. 


Conceptual models 

Having reached a definition, I hope, of what the end 
game looks like, let me step back again to make a 
slight digression into conceptual models. 


The connectivity network model. In the telecommu- 
nications business, and clearly that business is and 
has to be a player in the information superstructure, 
we have a very simple model of the network. At the 
centre is the core network which links main (now 
digital) switches by huge (in terms of capacity) 
optical (glass) fibre pipes. These then link local 
switches, off the end of which hang customers and 
customer sites, some already served by fibre links. 
We demonstrate connectivity by showing how calls 
pass from customers at one side of the network 
through the central core and out to the other side 
(fig.1). The switches (mam and local) are now, in 
fact, huge and complex computers and there is a 
cloud of 'intelligence' which envelops them, as 
wellas intelligence applied at some customer points. 

As a further digression this diagram illustrates 
that the 'superhighway' is already in existence — 
what the arguments in fact are actually about are the 
slip-roads on to the superhighway — the access 
points. There is, in fact, no shortage of technology to 
support this access; techaology is not an issue, al- 
though cost (who wil pay, and how much), 
applications (what it will be used for) and, to some 
extent, regulation are. 


Information superstructure model. When we come 
to look at a schematic for an information super- 
structure, a ‘different, and interesting, picture 
emerges (fig.2). 'Information' is not about connec- 
tivity or switches, nor does information have any 
obvious or predetermined method of storage of 
transmission. To write this paper I talked to indi- 
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either already designed and working publicly, work- 
ing in large scale trials, or working in the laboratory. 
Over time they will be further refined, get quicker, 
smaller, cheaper, easier — but the Gutenberg press 
stage is already achieved and bettered and now we 
are on the polishing, not the initial carving out. 


Content and mediation. Another takes the view that 
what is the key to all this is the coincidence of content, 
with all the richness that offers, with mediation, broadly 
the software rather than the technical hardware, in- 
cluding again such exciting concepts as intelligent 
agents, cut and paste, groupware, trading, filters, 
morphing and so on. In the past we have tended to treat 
information sources as separate from each other — 
there is the written word, the recorded word, direct 
information from other individuals, images. The in- 
troduction of intelligent software allows these to be 
brought together effectively, and media translations to 
be made so that text can be read out loud by a machine, 
and a voice message turned into a fax. More impor- 
tantly perhaps it allows machines to winnow the great 
mass of information to isolate the relevant and up-to- 
date. Almost as important, the information is delivered 
in ways in which it can immediately be reused within 
newly created documents (which might also include 
multimedia documents, films, animations, voice mes- 
sages and so on), cutting down time to create and cost 
and often increasing quality. This gets closer, in my 
view, to the heart of the matter, but it still very much 
lists enablers and causes, rather than the real effects of 
the IT information revolution. 


Simplistic view. I take, I am afraid, a very much more 
simplistic view — which may involve everything I 
have talked about, but which can be summarized as: 
'Getting the right information (and only the right 
information) to the right person, in the right format, 
at the right time.’ Those who have been involved, as 
I have, with searching for information for a purpose, 
will know that, simple as this sounds, this is a step 





Figure 1. A typical network topology as a connectivity model 
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the intelligence will not just support the user of the 
information but will also provide the necessary con- 
trol and management for the information suppliers, 
building up profiles of customer use, providing charg- 
ing mechanisms and ensuring that the information 
flows freely and effectively to where it is wanted. 

This machine intelligence is not, of course, some- 
thing really new, in concept at least; it replaces, or 
disintermediates, what we have used in the past, a live 
human expert or researcher. What it does, however, is 
what the human researcher or information manager 
can no longer do: it copes with the extraordinary 
volume of information (both the sourced information 
being requested by the user and the metadata which 
surrounds the usage) which is now available, and 
presents it in ways which are of immediate use. Sitting 
at home with a computer, surfing the Internet, with a 
few relatively simple applications, and perhaps some 
stored images on CD-ROMs and a halfway decent 
colour printer I can, in a few hours, print (or send 
electronically to up to 30 million Internet users) a 
piece of work which could have taken a dozen re- 
searchers, typesetters, photosetters and printers a month 
to produce only a few years ago. I can already use 
search routines which will hunt out what I need (and 
much more usefully, discard the gigabits of informa- 
tion I don't need). Running a Web Server [ can track 
who is using what information, and what information 
is not being used and could be discarded. 

In summary, the information superstructure of 
the future will be focused on the individual, location 
independent, multiple sourced, and technology inde- 
pendent. Mediation will be the key for the customer, 
and service management for the supplier. 


The impact on the public library of the future 


Why should this worry providers of Library serv- 
ices? They already have books and magazines and 
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Figure 2. Information superstructure topology 


viduals face to face and on the telephone, exchanged 
email, took published information from books, 
physical journals and from computer servers, some 
accessed over the Internet. The only commonalty, 
and the natural centre for an information super- 
structure schematic is the individual user — as 
information is sought on-site from people, 
electronic or paper media, and off-site from live 
individuals and recorded data, and from direct 
computer access, access via servers and from broad- 
cast media. 

This model (fig.2) illustrates two key things. The 
first is that information is available, and is delivered, 
via a multiplicity of different media. Many of these, 
most in fact, are mono-media, and it.is people who 
internalize them into personal multimedia experi- 
ences, only to externalize them, as 1 am doing in 
writing this paper, back into a mono-medium — in 
this case of print. In a few cases true multimedia 
*documents' (if that is any longer an appropriate 
word) are created, which include print, sound, mov- 
ing and still pictures and still and moving diagrams 
(animations) — and no doubt in the future a fuller 
multimedia experience will be possible, with scents 
and flavours synthesized, so that in reading, or per- 
haps experiencing, Proust we actually will taste the 
madeleine as he did. 

The second interesting thing, I believe, is in the 
clouds of 'intelligence' (technologists always draw 
clouds when they are not actually sure how things 
will work) associated with some of the information 
streams. It is the ways that this intelligence, which 
allow information to be found and to be manipulated 
and which will eventually allow information to find 
you — through the somewhat nebulous offices of the 
much vaunted ‘intelligent agents’ which mark the IT 
Information Revolution as being a real revolution, 
and not simply a further development of the last. But 
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ture, and the individual customer, not a class or 
group of customers, which forms the other. The 
overriding need will be for the two hubs to be joined 
effectively — for the customer to be put in touch with 
the information, delivered eventually through the 
medium of customer choice. 

Where a human cannot hope to locate specific 
topics or pieces of information from within all pub- 
lished works, or to hold in his or her mind the needs 
of individuals he or she may never even have met 
before, a machine can. Individual profiles of needs 
and interests can be held on computers (or on smart- 
card chips on new style library tickets?) and can 
immediately prompt offers to deliver regularly used 
information, or suggestions as to new classes of 
information which are now available. Programmes 
can be set to ‘learn’ from changed uses the changing 
interests of individuals. Most new information is 
created electronically, and could be *held? in that 
way, rather than the traditional ways of libraries; and 
increasingly important and relevant ‘old’ informa- 
tion is being, and will continue to be, digitized. 

This does not come as any news to librarians, and 
already attempts, some of them less successful than 
others, have been made within the public library 
service to take on board this change of focus, often 
thwarted, or substantially watered down, by local 
funding influence. However, libraries run within 
companies have been generally more adept (and 
resourced) at recognizing, and implementing, solu- 
tions for individual user information needs. 


BT experience. The BT business and marketing 
information library is called, quite intentionally, 
the Information Resource Centre, and its stock-in- 
trade is not volumes or publications, but information, 
even though it does provide some of the traditions 
of a library: a reading room, books on shelves, a 
catalogue. This meets the needs however of only a 
few of its internal customers, who have the time, or 
the inclination, to undertake their own hands-on 
research. Most of its customers either contact the 
IRC by phone, or fax, or email to request specified 
bits of information, relying on the IRC staff to 
know which is the best source for this; or pre- 
specify their interests, allowing electronic profiles 
to be made which will be delivered to them auto- 
matically and regularly. IRC staff in addition 
develop proactively, through contact with custom- 
ers, new ‘publications’ which concentrate and group 
relevant information, thereby reducing the levels of 
ad hoc information enquiries and releasing staff to 
work on high value-add research. We are experi- 
menting with the use of World-Wide-Web servers 
and of groupware, such as Lotus Notes, which will 
allow customers direct access to source material 
with appropriate navigation tools to allow them to 
pinpoint the information they require. We already 
are recognized as very sophisticated users of a 
number of software tools, such as TOPIC. 
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journals, videos and cassettes and records and audio 
CDs, some already have access to the Internet or dial 
up information services (more often within business 
or educational libraries). With enough money (of 
course always a key caveat) they can provide termi- 
nals for more ‘readers’. Access to the superhighway, 
whatever that is, can be seen as just another informa- 
tion source, just like books, or magazines, or 
CD-ROMs, one position would be that this is a 
change in focus perhaps, but not in the nature of what 
a library is. 


What is a library? But what is a library? I would 
argue that a library is (certainly was) most like a 
warehouse — a book warehouse, carrying stock fora 
mass market of customers. Librarians’ chief con- 
cerns can be seen as storekeepers’ concerns — 
inventory control, location identification, storage 
conditions — to serve mass classes of customers — 
young children, schoolchildren, students, 
businesses, the elderly, fiction readers. What is so 
frequently requested it should be kept at the front of 
the warehouse, what so infrequently it can be kept 
at the back, in the stacks? What needs to be 
specially housed (perhaps because of value or 
because of some fragility)? How many of which 
groups are to be served — what is the catchment 
area? How much resource is to be committed to 
serving which customers — what stock is not carried 
because demand for it is low, or because the cus- 
tomers for it, as a matter of policy, will not be 
served? How much of the stock location work can 
be placed on the customer, so relieving the ware- 
house staff for more important work, such as 
cataloguing and recording? 

Libraries (for traditional librarians) can be seen 
to contain artefacts, not, despite a romantic longing 
that this would be so, ideas. Many, perhaps most, 
librarians are primarily concerned with the contain- 
ers which hold the information, not with the 
information itself. Librarians give customers access 
to the information (or entertainment) containers, di- 
rectly, through cataloguing or through brief content 
summaries. The customers still have to locate and 
release the information from the containers them- 
selves, nor can they be sure, without considerable 
personal research, that they have all, or the most 
relevant, or the most up-to-date containers to hand. 
Whilst librarians frequently get to know their regular 
customers, their likes and needs, and many have 
some knowledge of the content of the containers, or 
their comparative value as sources (particularly in 
specialist libraries) in general librarians have neither 
the time, the training or the knowledge to act as 
information access experts for their customers as 
individuals. 

The IT information revolution can change, and 
probably will change, this focus. In the future it will 
be the information, not the source of the information, 
which forms one hub of the information superstruc- 
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Implications of the new role for public libraries 
There are probably five main areas where I weuld 
expect this to impact, eventually on the public library 
service. 


Proactive navigation and delivery systems. Y exdect 
the increased availability within libraries of systzms 
which will allow automatic sort and filter, initial=y of 
sources, eventually of information. If want to know 
about the Korean economy post 1988 I will key in 
those key words, and expect to be offered only 
information containers (and eventually information) 
which qualify — without having to go through a paper 
or electronic catalogue myself. I will expect that 
convenience in every library I visit, and preferably 
operated in similar ways, so that I can transfer my 
learning experiences. I will also expect, if Iam doing 
a particular course of study, to key in (or click on) 
some generic description (NVQ level 3 in applied 
woodwork, for example) and be pointed immedi- 
ately to all the (automatically updated) required 
sources and appropriate texts. I will eventually ex- 
pect text-based navigation systems through an 
increasingly voluminous digitized information base, 
and delivery of appropriate texts directly to the out- 
let, and in the medium, of my choice. 


Location. I expect the future library both to Fave, 
within the time frame we are discussing, a real physi- 
cal location, but also to be addressed remotely, Tom 
multiple locations, including schools, other ‘public’ 
places and home. Off-site access will probably re- 
quire higher levels of user skill (and of ccurse 
investment), but this, in itself, will allow libra-ians 
greater ‘space’ to concentrate their skills on the less 
well economically favoured, and the less skilled, 
whilst still being perceived as serving (and hence 
deserving financial support from) the wider commu- 
nity. Indeed as (and if) the public library of the fature 
starts to look more like the internal business libraries 
of major companies, such as BT, it may well be able 
to provide smaller companies with similar levels of 
service, which it otherwise could not afford, and use 
this as a revenue earning opportunity. 


Co-operation. I will expect libraries to make full use 
of information technology to manage effect vely 
their services, from data capture which allows pro- : 
files of usage and of users to be built, thus allo ving 
better customization of services, and better ‘indi- 
vidual' attention, to the use of electronic ordering 
and stock control so that the drudgery of clerical 
work associated with acquisition and loan manage- 
ment can be off-loaded to machines, thus releasing 
people resource to more value-add roles. 


New skills. The librarian of the future will gain new 
skills which reflect an increased value-add role, both 
in terms of helping users through multiple infcrma- 
tion scurces (and the systems that surround them) to 
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The IRC is targeted with achieving high value for 
money from its expenditure on primary sources, 
which means ensuring that information is available 
in a timely manner to as wide a range of people 
within BT as possible, and that it is actually being 
used. Clearly the driving forces behind a library such 
as ours is different from that of public libraries 
(though not all that different I would suggest) but the 
tools and capabilities we are developing, and which 
are being developed by other commercial informa- 
tion suppliers are as applicable in their use to public 
libraries. 

The role of the IRC is to act as a window which 
provides a user-friendly front end to people who are 
not information scientists, to the information they 
need. Some of that information is held in-house, on 
paper or electronically, but some is held by other 
libraries or information providers. The IRC is used 
by people within the company because it is cost- 
effective, because itis easy to use (and it intentionally 
offers a variety of entry strategies to the information 
to suit different ability levels and customer require- 
ments), and because it delivers. 

Libraries such as the IRC are creating new ex- 
pectations amongst sophisticated information users, 
which will readily percolate to a more general pub- 
lic, about how information should be presented to 
users, and how much the drudgery of information 
location should be taken away from users. With 
reductions in the costs of computing and communi- 
cations, and the real creation thereby of a true 
global village, there will be both commercial and 
public libraries, very possibly based outside the 
UK, which will start to offer these services in an 
affordable manner to a significant proportion of the 
general public in the UK. 

The key aspects of the services offered by the 
IRC are that it 1s an information gateway (which can 
be used on-site or remotely) which has used the 
techniques of mass customization to pre-prepare in- 
formation digests for users, in order to concentrate 
its experts on high value-add work, with a focus on 
pro-active information sourcing and digesting, rather 
than simply reactive response, aiming to automate as 
far as possible standard searches and information 
delivery to achieve a highly cost effective service. 
Users can now gain information from the IRC with- 
out troubling information researchers at all, and can 
themselves set up regular deliveries of updated infor- 
mation — on a daily basis if required. (Other 
companies’ systems, particularly those involved with 
trading, provide real-time updates of selected infor- 
mation.) 

If UK public libraries do not themselves 
move towards this new model they face the danger of 
increasing marginalization — and just as the advent of 
printing and the reduction in cost of books led to the 
demise of the chained library, save as a curiosity for 
tourists, so will the book warehouses become curi- 
osities in a world of information resource centres. 
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low — and many who access do so from work or 
educational sites, not from home. In the same article 
Jamie Muir of Packard Bell UK is quoted as saying ‘If 
they can get on the Internet, what are they going to use 
it for? They don't know where to go, there's no index 
— it's something you bave to be very smart to use. For 
the average consumer, until the interface becomes 
much simpler and there are good indexing tools avail- 
able, the Internet won't realize its potential.’ 

The public library service (with its funders) does 
have time to make the changes that I have suggested; 
the window of opportunity is not even starting to 
close, but those changes will take time and invest- 
ment, both in physical resources and in skills 
enhancement, and public libraries are not the only 
players in the game. 
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arrive at the information they need, and in under- 
standing differentiated needs of customers, linked to 
better understanding of information content so that 
pro-active routines and information delivery sys- 
tems can be developed. Librarians will act as 
information conciérges - like the conciérges of the 
top hotels, knowing the best places to visit, making 
the arrangements for customers in languages they 
don't even speak, giving the customers what they 
need before they have even asked for it — rather that 
the more traditional gatekeeper role where access is 
seen as limited and as a privilege. 


Danger of non-adoption 

If the funding authorities of the public library serv- 
ice do not eventually allow it to follow this path it 
will find that the growth of alternatives — home- 
based surfing of the Internet, commercial companies 
providing similar services — will begin to erode 
substantially the public library information user 
customer base and may lock public libraries effec- 
tively into a much more limited role (though still an 
important one) of meeting the entertainment needs 
of its audience — concentrating on the fiction/mu- 
sic/video end of the market. These are clearly 
important, and will continue to be so, but substan- 
tial reduction of the information side of public 
. libraries, whilst possibly being supportable within 
a more limited library service, will also lead to the 
further creation of a two nations effect, with an 
increasingly large group of ‘information disenfran- 
chised' growing within the community. Investment 
in an information focused (although not exclusively 
so) public library service is an investment in the 
future of local communities. 


Don't panic 

This will not (and need not) happen overnight — the 
UK is still not a computing or information focused 
society. A recent study shows that only 25% of UK 
households have a computer (as opposed to games 
machine), and that only a further 1196 are prepared to 
indicate that they plan to buy one. 'Accessing the 
Internet’ does not even ‘register’, according to a sum- 
mary of this report in the /ndependent’ ‘on the list of 
most popular PC usages, where word processing and 
games dominate'. Growth rated in Internet users in the 
UK have been spectacular only because the base is so 
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ards etc., are less than optimum for the realization of 
the ideal. As a result, much of the literature deals 
with technological aspects and the research is con- 
ducted within frameworks set by the library metaphor. 

It was my original intention (as the title of the 
paper suggests) to talk about social and economic 
factors in the implementation of the electronic li- 
brary. However, the more I thought, the more I came 
to realize that it may be the library metaphor that 
restricts our thinking and holds us back from the 
development of new systems that may approach 
more closely the ideals of scholarly communication 
and the transmission of knowledge. 

I wish to suggest that terms such as electronic 
library, electronic journal, and electronic publishing 
all stem from a failure to stress that the core of the 
revolution we find ourselves in is not that existing 
systems and activities now have an electronic form, 
but that library, publishing and journal are archaic 
and obsolescent, if not yet obsolete, ideas. 

When we seek the underlying concepts of these 
terms we discover words like store, transmit and 
distribute — the basic ideas are those of storage, 
organization and communication. It will be useful, 
therefore, to replace the three words I focus upon by 
the term communication system. System is not in- 
tended here to mean a technological system — but any 
system of human communication. Of course, librar- 
ies, journals and publishing are systems of 
communication, but I believe we need to step back 
from these specific manifestations of communica- 
tion systems to the underlying, generic idea. 

A new system of scholarly communication, based 
on electronic systems and networks, not only neces- 
sitates new models for the concepts of journal, 
library and publishing, but also new interpersonal 
and institutional mores, customs and practices, and a 
new basis for the economic conditions associated 
with communication. 


Continuous and discontinuous change 

History locks us into particular modes of thinking: 
we are what we have become, with all the baggage of 
the past. The past, in information terms, has been one 
of gradual processes of change as technology and 
other social and economic factors have made changes 
possible. Thus, the modern library would be recog- 
nized by Panizzi as being a direct descendent of the 


Introduction 

I have given this address the title, ‘In the beginning 
was the word...', which is not exactly a novel coin- 
age! However, they are words worth remembering, 
since they draw attention to the primacy of the word 
in the formation of our understanding (and, indeed, 
definition) of the world around us. Words still have 
that primacy, but they can be illuminated by images 
and moving pictures and by numbers and sounds. 
Electronic communication now provides the means 
for integrating words, sounds, numbers and pictures 
to present ideas and research findings more rapidly 
and more effectively — and the world is changing. 

We find ourselves in a world that commentators 
have described as chaotic: a world in which it seems 
that the pace of change grows ever faster — although 
this may be more of a function of the apparent 
disorder in our lives than in any true acceleration. We 
are in the middle of some kind of turbulence, a space- 
time whirlpool in which direction is difficult to 
determine, since so many changes are happening 
apparently simultaneously and the consequences of 
many developments are difficult to determine. The 
lone pilot, flying at night, can find his way and keep 
his aircraft level, moving in the right direction: he (or 
she) has instruments that provide the necessary in- 
formation for the appropriate calculations and actions. 
We have none, or those we have in the way of 
indicators and trend lines, are crude by comparison. 
We seem, therefore, to be buffeted bv change and we 
grope around, seeking direction in the selection of 
research avenues; in the management of organiza- 
tions; in decisions that affect our future, or the future 
of our companies, schools and universities; and in 
the design of curricula for the transmission of knowl- 
edge, which, in some areas, is changing faster than 
we can change the curriculum. 

Within this world, the idea of the electronic li- 
brary has emerged às a model for future systems, 
already implemented in some forms and to some 
degree in various places. Technology drives the de- 
velopment ofthe idea: but it is not always appropriate 
technology — more often, it 1s technology created for 
other purposes. The possibility, rather than the proven 
value of the technology, is testified to by the volume 
of research that addresses how to work round techno- 
logical limitations to implement the desired system. 
Hardware, software and telecommunications stand- 
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Electronic communication 
The key feature of electronic communication sys- 
tems is that they offer the possibility not only of 
disseminating information over telecommunication 
networks (local or global) but of interpersonal com- 
munication among collaborating authors or producers 
and, perhaps more importantly, interpersonal com- 
munication between author, reader, user or consumer 
of the information. Thus, interpersonal communica- 
tion may be employed not only to give virtually 
instant feedback on the value, usefulness, veracity or 
validity of information, but also to provide a basis for 
the participative development of a document — not 
only between collaborating authors in the usual sense, 
but also in terms of the participation of the intended 
audience in the design anc content of the document. 
I shall return to this aspect of electronic commu- 
nication shortly. 


The social and economic factors 

If we wish to explore tke potential of electronic 
communication systems for academic organizations 
or for intellectual communities, we must examine 
the social and economic factors now operating in 
universities and in scholarly disciplines. I will con- 
sider three of these factors: the idea of an academic 
community, the mores of the academic reward sys- 
tem, and the economics of scholarly communication. 


The idea of an academic community 

The word university still conjures up in people's 

minds, I believe, the old ideal of a community of 
scholars, researching to advance the boundaries of 
knowledge, teaching to communicate that knowl- 

edge to new generations. However, over the ۶ 
twenty years that ideal has taken some severe knocks. 

We have seen growth in student numbers, constant 

pressure of financial restraint, the disappearance in 

the UK of the binary line between universities and 

polytechnics, and the macket orientation of much 

curriculum development. All of these have tended to 

move the idea of a university from that of a ‘temple 

of knowledge’ to that ofa factory. So, together with 

all of the other changes -n society, we have wit- 

nessed a very significant change in the very idea of a 

university, and this change affects the way research 

is carried out and the way students are taught — there 

is, altogether, a more instrumental view of education 

and research. 

The key features in the present situation are: the 
considerable increase in student numbers, without 
compensating resources from the state; the speed of 
change in knowledge in certain areas; and the in- 
creasing instrumental orieniation of the entire system. 


. The mores of the academic reward system 
`. The curious thing 1s that the nature of the reward 


system in academia has remained much as it was, 
when the ideal was closer to being realized. Move- 
ment from probationary status to confirmed lecturer, 
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library of the 19th century, Cutter would recognize 
the modern OPAC as a (rather corrupted) version of 
the bibliographic tool for which he designed rules of 
preparation and arrangement, and the online biblio- 
graphic database is not so very different from the 
traditional catalogue. In other words, even with the 
impact of the computer on our way of doing things, 
what exists is the result of continuous, incremental 
change. 

The same may be said of the book and the jour- 
nal: printing technology has advanced enormously 
since Gutenberg set his first Bible, or since Pi Sheng 
in China, four hundred years earlier, produced the 
first movable type. But the model that the book and 
journal use is no different: we have had five hundred 
years of development in printing technology with no 
significant change in the fundamental form of the 
artefact produced. 

In today's information world, however, we are 
now truly in the age of discontinuity (as Drucker 
termed it as long ago as 1968). Discontinuous change 
happens when an innovation, or series of innova- 
tions (which may be social, technological or 
whatever), leads to a sudden jump in the process of 
change, so that the smooth curve of development 
moves on to another plane. Numerous management 
gurus, from Drucker to Peters to Handy and the rest, 
have drawn attention to this phenomenon and have 
proposed ways in which organizations are likely to 
change, as a consequence. Whether any one of 
these sages is right in his forecasts remains to be 
seen, but there is at least some unanimity of the 
need for change in the information society, in the 
information age. 


The nature of the discontinuity 

It would be easy to describe the nature of the discon- 
tinuity as it affects scholarly communication in terms 
ofthe Internet, which is its immediate manifestation; 
but, in fact, the real agents of change are the stand- 
ards, telecommunications systems, software systems 
and hardware developments that are making multi- 
media, interpersonal communication systems (shall 
we call them MICS?) a reality. And those systems 
are making possible many things that we could only 
vaguely hint at even a few short years ago, and I have 
seen it suggested that 1996 is the year in which 
everything will come together. 

We are, perhaps, at the beginning of this process 
of discontinuous change, since some things have 
been possible through the Internet for many years. 
Mainly, however, they have been text-based: email, 
file transfer, bulletin boards, news-groups and so on. 
But, quite suddenly, we have the fusion of text, 
sound, image and video (including live video through 
systems such as CU-SeeMe and the same kind of 
functionality delivered over ISDN lines commer- 
cially). So we have a situation in which the potential 
in communication systems is several times greater 
than we have ever previously known. 
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reward system, not from the accounts of the pub- 
lisher. Academics will happily write away, send off 
their contributions to journals, cross their fingers and 
hope for publication — preferably in one of the more 
prestigious titles in their field. 

This indifference to cash exists regardless of the 
means of publication: as long as means are found to 
record publication, to rank the source of publication 
and to record citations in other publications, authors 
will be happy to publish in whatever form is appro- 
priate. As long as the reward system can identify 
sources of publication, rank them and, for some 
purposes, draw upon records of citations, academic 
promotion panels will also be happy. 


Implications 

The first consequence of the discontinuity I described 
earlier, is confusion: many systems and functions are 
going to have to adjust and the process of adjustment 
is not particularly easy. Uncertainty affects all exist- 
ing systems of communication and information 
transfer, since any innovation in communication sys- 
tems is a social innovation ~ communication is the 
fundamental social act. That fact means that the 
social customs and practices regarding recognition 
and reward in scholarly communities must change in 
an era of electronic communication. However, given 
how well rooted are the present practices, it may be 
some time before social mores catch up with the 
reality of electronic communication. 


The journal 
From the point of view of libraries (and, possibly, 
academic libraries in particular) the implications are 
fundamental and depend on changes in the systems 
of intellectual communication and the mores of re- 
ward and recognition. So — what is the potential? 
Consider the journal: in paper form it is embedded in 
a set of cultural and institutional practices that have 
determined its origins and present form. The learned 
journal (originating in the scientific societies of the 
17th and 18th centuries) was originally intended to 
disseminate the proceedings of meetings to members 
and to serve as an archive of the papers presented. 
For as long as scholarly publishing remained the 
province of the learned society, and as long as the 
scientific community was small, the original ideals 
were maintained and largely satisfied. However, the 
increasing size of the community and the demand for 
copies from those who were not members of the 
societies led to a need for professionally managed 
production and distribution systems and, hence, to 
the entry of commercial publishers into the process. 
The entry of the commercial publishers has af- 
fected scholarly communication in a number of ways; 
some positive, some negative — some both at the 
same time. For example, journals are now marketed, 
whereas when they were produced by the Icarned 
societies they did not need to be, at least not to serve 


their functions for the membership. This results in a... 
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from lecturer to senior lecturer, to reader, and to 
professor still depends, at least in the so-called ‘old 
universities’, on research and publication, although 
most today take greater account of other roles and 
performances than they used to. Building upon their 
polytechnic past, when other criteria applied, the 
new universities often have rather different criteria, 
but they, too, now pay more attention to research and 


publication because of the importance of these mat- 


ters in the national research assessment exercises. 

At present, the assessment of research perform- 
ance is done chiefly by examining the publications — 
or at least by examining the record of publication — 
and by looking at the range and value of research 
projects gained by a candidate for promotion. Cita- 
tion data are rarely checked, although a candidate 
might refer to data if he or she had checked the 
databases. 

Recognition within the discipline is the result of 
many social and professional factors, of which pub- 
lication is but one. Appearance at conferences, 
editorships of journals, memberships of editorial 
boards, and, increasingly, visibility on the electronic 
network, all count. 

In short, recognition and reward depend upon a 
variety of factors, most of which (but not all) relate to 
the dominant mode of scholarly communication. 
That word dominant is important, because it sug- 
gests to me that, if other modes of scholarly 
communication come to dominate, the change will 
rapidly be taken account of in the recognition and 
reward system. 


The economics of scholarly communication 
Scholarly communication has a very curious kind of 
economic base: I know of no other industry where 
the raw material is virtually free. Because of the 
mores of the reward system, authors are eager to give 
their work to a journal publisher and they are equally 
eager to act as referees, editorial board members and 
editors of journals. Editors of journals may receive 
an honorarium (usually extremely small!), but that is 
the only example of a cash relationship in the entire 
process. (Since I have acted as founder and editor of 
two journals and am acting now as author, referee 
and editorial board member, my comments are not 
theoretical!) 

Not only do authors donate their work to the 
publisher, they also sign away their copyright, so 
that the protection of their intellectual property de- 
pends upon the goodwill of the publisher. The work 
may be protected, but not necessarily to the benefit 
of the author. 

Academic authors, as people, are not indifferent 
to money — they need it as much as anyone else and, 
given the decline in academic salaries in the UK, 
more than some in comparable occupations. How- 
ever, in relation to publication, money has no place — 
the object is to publish, not to get paid, because the 
financial benefits of publication flow through the 
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possible for a reader to test alternative hypotheses on 
those data, see them reflected in changes in the 
original author's graphs, and, perhaps, come up with 
alternative explanatory models for the phenomena 
described in the paper. Again, communication could 
take place around the use of the data and remote 
collaboration might ensue. 

Indeed, we could reach the point at which the best 
term to describe what the paper had become would be 
to call it an electronic seminar or, perhaps, an elec- 
tronic senior common-room, or even, as one colleague 
suggested on hearing my propositions — a kind of 
scholatly MUD! Whatever we call it, there can be little 
doubt that what I have described is likely to serve the 
purposes of scholarly communication rather more 
effectively than delayed publication in print. 


The book 

The book in academia takes three main forms: the 
reference work, the specialized monograph and the 
text-book. All three are open to competition from 
electronic communications and, of course, reference 
works have been the first to be affected by that 
competition. The online financial and numeric 
databases have all but replaced their paper versions, 
certainly as far as the business user is concerned, and 
other areas are being encroached upon. In this area, 
the traditional publisher is on surer ground, since 
publication demands effort and cost in compilation 
and products are priced accordingly — although even 
here the packaging of products with computer sys- 
tems, as in the case of the various CD-ROM 
encyclopaedias, can reduce costs considerably. While 
much of this material may take electronic form, it 
seems unlikely that much will be made freely avail- 
able by enthusiasts. Perhaps the only question is, 
‘Which electronic form will be used - CD-ROM or 
publication over the Internet in the form of World 
Wide Web pages?’ 

The specialized monograph is a different matter. 
Highly specialized publications generally mean small 
markets and one is never surprised to find highly- 
priced monographs remaindered at less than half 
their original selling price. There is clearly scope 
here for electronic publication, since again, even 
with a high price-tag, the author is likely to have very 
little cash interest in publication. Furthermore, elec- 
tronic publication is likely to reach a wider audience, 


- and, through targeting publicity to relevant mailing 


lists, can reach an appropriate audience of academic 
peers. Recognition then becomes the extent to which 
the work is cited by others, perhaps the extent to 
which print copies are requested, and, if mounted on 
Web pages, the extent of use of those pages by 
others. The stage may even be reached that demand 
for cheaply produced paper copies reaches volumes 
that makes it worthwhile for the author to publish a 
cheap printed version — electronic communication 
makes the sensitization of the market and the estima- 
tion of demand very much easier. 
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proliferation of titles all aimed to capture some part 
of a market that may have to be stimulated, rather 
than existing in the form of unsatisfied demand. 

Some of this proliferation is undoubtedly useful 
and journals in new specialisms get off the ground 
when it would have been difficult to await the emer- 
gence of a society to provide the base readership, or 
to have persuaded mainly conservative organiza- 
tions that a new journal was actually necessary. 
However, there is no doubt that some publishers take 
advantage of the library market in particular in pro- 
viding the subscriber base for something that, 
otherwise, they might not risk in the marketplace. 

There is also no question but that some publish- 
ers take advantage of the situation to load costs on to 
successful journals by increasing their prices well 
beyond any estimate of the rate of inflation — there 
even exists a mailing list on the Internet to discuss 
these miscreants. 

Electronic communication has the potential to 
make an impact on this situation in a number of 
ways: first, if the original ideal of the journal was to 
communicate the results of investigation among a 
community of scholars, that ideal can be satisfied 
more rapidly by electronic communication than it 
can by paper. Electronic communication is faster 
and, in some fields, the communication of results 
through electronic newsletters has become common- 
place — the practices are changing. Significantly, 
publishing on a new research area or from a new 
theoretical perspective can be done without seeking 
to create a paper journal. Secondly, developments in 
multimedia give a promise of a much richer kind of 
text than can be made available in print. The orni- 
thologist can include bird-song, if that is appropriate; 
the film critic can run video-film clips; the musicolo- 
gist can include recordings of strange instruments or 
sound clips of the composer being reviewed; the 
modern historian or political scientist can include 
sound or video interviews. These ideas are already 
present in CD-ROM encyclopaedias and their trans- 
fer to the Internet is already under way. 

Thirdly, the economics of the situation favour 
electronic publication by the author, or by consortia 
of scholars in particular fields producing electronic 
journals of the type described. 

Once the material 1s available in electronic form, 
other possibilities open up. At the very basic level, 
the inclusion of email forms in papers (we'll need to 
find another word), will allow immediate communi- 
cation of comments to the author and exchanges may 
take place that turn a single-authored text into a 
multi-authored text. 

At another remove, we already have the capabil- 
‘ity to use search engines within pages to search on 
local or distant databases. Suppose the author of a 
scientific paper placed the database of experimenta] 
results behind the tables in the paper and made the 
graphs and the formulas relating to his or her hypoth- 
eses interact with that database. It would then be 
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however, they remain recognizably the same kind of 
institutions as they were one hundred years ago, 
particularly those that serve academic communities, 
and I suspect that there is huge, fundamental inertia 
in the system that will be difficult to overcome in 
moving towards a new metaphor of information in 
learning. 

When we consider the kinds of changes in the 
scholarly communication I have hypothesized, how- 
ever, it is clear that if they happen (or perbaps it is 
even now possible to say when they happen), librar- 
ies must change radically. I have no doubt that the 
book has a future and that future libraries will have 
book-stocks. But I suspect that the books and jour- 
nals will play a lesser role and the role ofthe librarian 
will be to provide a learning support system for the 
complex computer-mediated interactions that will 
take place among scholars and between teachers and 
learners. 

We have become accustomed to the term interme- 
diary in the information world, to denote the role ofthe 
librarian in aiding online search and retrieval: perhaps 
words like facilitator and mediator will become more 
common. These words signify role changes, as the 
person needing information, or access to networked 
learning resources, seeks help in negotiating the maze. 
That help may take the form of face-to-face assistance, 
or of remote assistance over a network, or the librarian 
may be the creator of helpful Web pages (or whatever 
comes next) that provide links to the really useful 
resources in the institution's own network, or in the 
wider world Web. 

Interestingly, this is also likely to be what teach- 
ers are increasingly doing and perhaps we can 
envisage an entire university as a learning support 
network, with the roles of teacher and librarian over- 
lapping and interacting. 

In circumstances such as these, the word library, 
with all its historical connotations, seems very inap- 
propriate; I expect that we shall get lots of new words 
— learnet, perhaps, or cyberstudy — no doubt Wired 
magazine will come up with one or two more. 


Publishing 
As for publishing: what future can I suggest? The 
publishing industry is subject to these forces of dis- 
continuous change as much as, if not more than, 
libraries and librarianship. Publishers, too, will con- 
tinue to exist in all kinds of ways: I do not see popular 
magazines, for example, turning into World Wide 
Web pages (except for marketing purposes); I do not 
see the disappearance of fiction or its total transfor- 
mation into multimedia games. However, scholarly 
publishing is going to be electronic — of that I have no 
doubt — it is merely a question of time. The question 
publishers are asking 1s, ‘How do we retain a role in 
the process?’ and I do not find it easy to suggest an 
answer. 

Patterns are emerging, with some publishers pro- 
viding contents lists and abstracts through gopher 
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However, it is with the text-book that the most 
interesting possibilities arise. Unless a text-book is 
chosen as the standard text for a subject in the 
national curriculum, or is chosen by a significant 
number of universities as a first-year text in a par- 
ticular field, it is unlikely that the author will make a 
great deal of money. Indeed, the main motivation for 
writing a text-book appears to be dissatisfaction with 
the text-books currently available: they either fail to 
cover those aspects of a subject a lecturer thinks 
appropriate, or they are out of date. 

Furthermore, it 1s highly likely that the lecturer 
has been making handouts available to students for 
some years, trying out ways of presenting material, 
illustrating that material more effectively, devising 
exercises and so on. Today, much of that material is 
likely to be available in machine-readable form. And 
so the scene is set for the delivery of the same 
information to students over the campus network. 
Once the material is made available in this way, the 
process of developing the electronic text can be 
made thoroughly interactive. 

Students can email the author with problems of 
interpretation, ideas on illustrations and examples, 
additional bibliographical references — and similar 
information can be culled from term papers and 
essays, or from experiments, if the subject is labora- 
tory-based. The text may have hypertext links to 
Internet resources and these may be added to by the 
students as searching the net becomes a normal part 
of learning. Self-assessed or computer-marked tests 
can be delivered as part of the electronic text, thus 
enabling the students to monitor their own progress 
and for the author not only to do the same, but to 
identify points in the text that have failed to get 
across the necessary understanding. Logs of the use 
ofthe text and the way the students navigate through 
it become available as research data on the use of 
electronic texts — either for the author or for col- 
leagues working in this area. 

"Text-book' is hardly a word to be applied to an 
artefact of this kind — the text has become an interac- 
tive electronic classroom. 

And, having done this for one group of students, 
why not seek fame on the Internet by making it 
generally available? 


The library 

If scholarly communication and the transmission of 
knowledge develop in the ways I suggest, what hap- 
pens to the library? Is the term electronic library 
based on a valid metaphor? Will the future library be 
anything like the modern library? 

Anyone who preaches the disappearance of the 
book and the journal, in totality, is likely to be on 
shaky ground. The death of libraries has been much 
exaggerated over the past forty years but there can be 
little doubt that libraries, like all organizations, are 
subject to change, and considerable changes have 
been experienced in the recent past. In spite of this, 
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library the networked learning support system. Pub- 
lishing remains publishing — since documents are 
published whether the medium is print or electronic — 
but perhaps publishing will be taken back into the 
scholarly community, or perhaps the scientific and 
scholarly societies and institutions will take it back, 
and possibly the role of the commercial publisher will 
bereduced. Whatever happens it seems possible to say 
that the mores of reward and recognition are changing 
and will change further and that the change will be 
reinforced by the economics ofthe situation. But there 
is something more than economics involved: I believe 
that we will also see the original ideals of scholarly 
communication realized on a wider scale than any 
Fellow of the Royal Society could possibly have 
imagined and that electronic communication will ac- 
tually reinforce the idea of community. 


menus or on Web pages, mostly freely, but some are 
making abstracts available only to subscribers. Some 
are making electronic copy available before the 
printed version — but only to subscribers, of course; 
and I am sure that all are thinking seriously about 
pricing on the basis of providing electronic access to 


individual papers — to all potential buyers, rather 


than only to subscribers. I think that it is pretty 
certain that scholarly publishers must go down this 
route if they are to continue to have a role: it will 
certainly provide some very interesting data on the 
extent to which scholarly papers are actually used! 


Conclusion 

In this exploration of possible electronic futures we 
have seen the journal become the electronic seminar, 
the text-book the interactive electronic classroom, the 


A full set of papers from Elvira 95 will be published by Aslib in the autumn. 


Aslib Proceedings, vol.47, no.9 


202 


Impel project: the impact on people of electronic libraries 


Impel project: the impact on people of 


electronic libraries 


Catherine Edwards, Joan M Day, Graham Walton 


University of Northumbria at Newscastle, UK. 


Paper presented at ELVIRA '95: 2nd International Conference on Electronic Library and Visual Information 
Research held on 2-4 May, 1995 at the Hilton Hotel, Milton Keynes. 


attitudes of information staff working in an increas- 
ingly electronic environment. 

The research is exploratory rather than descrip- 
tive in nature. It aims to establish key factors 
contributing to effective management of information 
provision in a networked campus environment. 
Within that the study revolves round: 


organizational and social impacts of educa- 
tional and technological change on library 
management and personnel 


factors which influence strategy in the imple- 
mentation of electronic networks 


the knowledge, skills and training required by 
academic librarians and the implications for 
both initial and continuing training and devel- 
opment 


It is accepted that every academic institution in 
the UK has its own individual history and set of 
circumstances influencing its operation. It is also 
accepted that electronic developments cannot be seen 
in isolation; they cannot be seen as separate from the 
enormous and rapid changes occurring in higher 
education, in particular, increased student numbers, 
increases in the price of library materials beyond the 
rate of inflation, changes in teaching methods and 
the falling unit of resource per undergraduate. No 
attempt is made to generalize on findings in one 
single institution, but, clearly, any findings made 
across a number of case study sites may be relevant 
to the wider sector. The study does not attempt to 
compare institutions in terms of quality or success. 
To do so would be invidious and unhelpful. 

This paper presents some initial findings of the 
IMPEL Project, at an early stage of analysis. It 
explores the attitudes of library and information staff 
working in an increasingly electronic environment, 
based on the results of one questionnaire referred to 
above (see appendix), and underpinned by com- 
ments offered at interview. It raises many questions; 
answers will be addressed at a later stage. 

The questionnaire was circulated to all library and 
information staff in the six case study sites. There was 
no pressure to complete the questionnaires which 
were treated confidentially; respondents were not asked 
to give their identity. Figures are given from three 
groups of respondents, in all 98 library assistants and 


The IMPEL Project is a research project run collabo- 
ratively by the Information Services Department and 
Department of Information and Library Manage- 
ment at the University of Northumbria at Newcastle. 
It investigates the human aspects of increased elec- 
tronic provision in UK academic libraries, focusing 
at this stage on the social and organizational impact 
on qualified librarians. 

The research is based on case studies in the 
libraries of six UK universities which were selected 
on the basis of a sampling survey in the form of a 
postal questionnaire sent to 98 chief academic librar- 
ians. The survey was designed to indicate those 
institutions which had achieved a significant degree 
ofelectronic development within their libraries, based 
on criteria identified in an extensive literature sur- 
vey!. It established those which already had a written 
IT strategy, had extensive collaboration between 
library/information and computing services, whose 
staff had received some training to equip them to 
Work in an electronic library and where all students 
had access to the JANET network. The response rate 
to the questionnaire was 83 per cent, eleven institu- 


tions fully meeting the criteria set. These eleven were . 


narrowed down to six by taking into account the age, 
size and type of institution; these are the universities 
of Aston, Cardiff, Central Lancashire, Cranfield, 
Stirling and Ulster, a group which reflects a broad 
range of institution and provides a satisfactory geo- 
graphical spread ??. 

The case studies, the last of which was com- 
pleted in March 1995, draw from three sources to 
provide a balance between qualitative and quantita- 
tive information. The main source of qualitative 
data is in-depth interviews, 82 in all, conducted by 
the researcher, with qualified librarians at all levels, 
library assistants, information service directors, 
computing service directors and institutional man- 
agers. This data are underpinned by documentation 
such as IT strategies, mission statements, corporate 
plans, line management diagrams and training 
programmes. The third side of this triangular 
approach is provided by brief questionnaires based 
on Likert scales: one designed to determine indi- 
vidual training and development needs, one to 
establish people's perceptions of the role of IT 
within organizations and one to reveal the personal 
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with suppliers and publishers, discussing prices and 
licensing agreements. More time was spent in promot- 
ing services within and without the library, especially 
in academic departments. Staff reported huge increases 
in enquiry Work, one assistant librarian stating that his 
enquiry work had doubled in the last two years. Com- 
puting officers within libraries also reported heavily 
increased workloads as their services became more 
essential to the business of the library. 

The theory that the move towards electronically- 
held information would free up time for library staff 
was not held to be valid at this stage of development. 
As one information services manager said, ‘All across 
the team, electronic information retrieval 1s chang- 
ing the nature of people's jobs but we're not seeing a 
reduction in the workload.' The overwhelming cry 
from staff at all levels was for more time. 

e To what extent do jobs across the hierarchy 

need to be redefined to ease the growing 

stress at point of service delivery? 


Statement: Use of electronic information 
increases my job satisfaction. 


p^ 











E Lib Assts 
El info Librarians 
= Sen Managers 


B | ants Be 
Agree Undecided Disagree 
Figure 2. 


There was strong agreement across the board 
with this statement. A factor contributing to increased 
job satisfaction was that the work was felt to be more 
interesting, despite increased workloads. Electronic 
information was felt to give students a better learning 
experience; they were also quick to provide feed- 
back. Many library staff enjoyed being at the forefront 
ofinformation provision; one remarked that the sheer 


senior library assistants, 65 assistant librarians, sub- 
librarians and subject or information specialists, and 
14 senior managers, forming a useful but non-repre- 
sentative sample. Staff were asked to respond to 
statements on the scale Strongly Agree, Agree, Unde- 
cided, Disagree, Strongly Disagree. On occasions the 
response was left blank or marked ‘not applicable’; 
this accounts for any discrepancies in the percentage 
totals. There may be local differences in results be- 
tween sites; these are not analysed at this stage. 


Statement: Use of electronic information 
reduces my workload. 
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Although fairly balanced, more library assistants 
agreed with this statement than disagreed. This prob- 
ably reflects the type of work they are engaged in, 
using electronic sources for bibliographic checking, 
cataloguing and acquisitions. Library assistants were 
conscious that the volume of work they undertake 
would now be impossible with manual systems. 

Professional staff, however, were firm in their 
disagreement with the statement. Professional staff 
attribute the increased workload to the changes asso- 
ciated with the introduction of electronic systems and 
sources, Chief among these were the pressure to learn 
quickly the function and content of new systems as 
they appeared; in fact ‘learning’ emerged as a constant 
theme in discussions with library staff. The increased 
teaching load associated with the introduction of new 
systems was significant. Teaching both of users and of 
staff largely fell to information or subject librarians, 
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Statement: 1 feel frustrated by my lack of 
technical expertise. 


Frustrated by lack of technical 
expertise? 
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Figure 4. 


There was agreement, but not total agreement, 
with this statement. Library staff were generally 
felt to have low levels of technical expertize, 
although clearly there was a wide range among 
them. 

At one extreme, a senior assistant librarian 
commented: 'It's not just that we don't know how 
to navigate the Internet — we're still all very new 
to the concept of the enter button.’ At the other 
extreme, many staff have developed a high degree 
of technical skill. They could often be irritated by 
recurrent problems with printers and terminals and 
resented the time spent fixing them. The result 
often was that there was greater dependency, and 
so greater pressure, on those individuals who had 
developed technical expertize; those individuals 
often had a natural aptitude and personal interest 
in technology. Library staff often seemed surprized 
and almost caught unawares by the impact of the 
nuts and bolts of the electronic environment: ‘The 
role has changed. The job has changed. We're far 
more into the guts of the machine than we ever 
were before.' 

e What level of hardware and software skills 

should now be expected of staff? 


Statement: 1 am able easily to keep up with 
electronic developments. 


‘to Easy to keep up with electronic 
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There were very few library staff who admitted 
that the electronic sources reduced their job satisfac- 
tion although there were sufficient reasons to do so; 
only one was brave enough to confess outright hos- 
tility towards them. Possibly for most staff 
the positive feeling of conducting a successful 
search outweighed the frustrations and pressures 
surrounding it. 

e Will traditional areas of library work suffer 

in the concentration on electronic delivery? 


Statement: Electronic information sources 

make me more effective in my work 

Here again there was broad agreement with the 
statement across all three groups, although a small 
number were undecided or disagreed. 
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Figure 3. 


Interviews suggested that staff felt they were 
more effective because of the breadth and depth of 
information available electronically, also its speed 
and immediacy. It led to better use of stock held 
within the library. It was often felt to raise the profile 
of library staff, to lend them greater status in the eyes 
of users. 

However, staff were quick to indicate where their 
effectiveness was lessened. Electronic systems had 
introduced a greater level of difficulty or complexity 
into the process. There was increasing dependency on 
the technology and often staff could often offer no 
alternatives when systems were down. Staff often felt 
vulnerable when a new service had been introduced 
without adequate warning and appropriate training, as 
was sometimes the case. They were frustrated by the 
range of different interfaces presented to them. They 
commented that although they had a heavy teaching 
commitment, they for the most part had no teaching 
qualifications and were unsure how effective their 
teaching was. They were frustrated by technical prob- 
lems: ‘If there’s a machine it will go wrong.’ 

One main worry was over the raised expectations 
of users, which could not always be met. There was 
also concern over the quality of end-user searches, 
over which library staff had no control. 

e Technical support, teaching skills and 

planned service provision are crucial. How 

can these be integrated into workloads? 
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of a hierarchical nature. Cross-group working was 
seen as essential as technology increasingly 
underpinned information delivery. Within teams, 
individual skills levels tended to make gradings less 
relevant although there was little evidence of 
breakdown of the professional, non-professional 
divide. Staff perceived tke need for more fluidity 
between sections, although: the gulf between systems 
and services was often referred to. There was more 
coming together of staff aver something new, more 
talk, more discussion; staff were aware that 
collaboration was needed -o avoid duplicating effort 
and to make sure nothing vital had been missed. 
Information librarians were taking on an 
instructional role vis-à-vis their colleagues; one had 
an increased sense of colleagues as customers. There 
was also a good deal of 'territoriality': not all 
professional staff willing tc share their roles with non- 
professional staff. There was a danger of 'electronic 
elitism', those being mcre au fait with systems 
somehow being perceived as more competent. 
e What effect will increasing reliance on e- 
mail to communicate with colleagues have on 
group dynamics? 


Statement: Working in an electronic environ- 
ment isolates me from the users. 


% isolates me from users? 
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Similarly there was brcad disagreement with this 
statement. The high agreement among information 
librarians confirms the strong finding at interview 
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Statement: I feel confident using NEW sources 
of electronic information. 


% Confident with NEW sources 
100 of electronic info? 









80 EJLib.Assts 
106 Librarians 


~: Sen Managers 


60 


40 


20 


Figure 6. 


Figure 5 shows a preponderance of staff who 
were not easily able to keep up with electronic 
developments although a sizeable percentage were 
undecided. 

Figure 6 reflects the greater confidence of 
information librarians and library assistants in using 
new sources, as compared with senior managers. 
Overall more people were confident than not. 

Where the introduction of new systems and 
sources had been well planned and staff had been 
informed, trained and supported, there appeared to 
be greater confidence. Staff expressed great diffi- 
culty keeping up with the rate of introduction of new 
electronic sources and the speed with which they 
needed to assimilate them. The speed of change was 
felt to be accelerating rather than slowing or steady- 
ing: 'They are adding more quickly than you can 
familiarize yourself.' 

The question of training was fundamental. There 
was a certain tension between the need to train to a 
level of awareness and the need to train to a deeper 
level, which had not been resolved. There was also 
the problem of refreshing the skills and knowledge 
gained but perhaps not put into use. Lack of familiar- 
ity with sources, infrequent use of them, left staff 
feeling exposed. It was suggested that the volume of 
sources becoming available, requiring mastery by 
small numbers of staff, tended to push 'subject 
specialism' to the background and bring 'informa- 
tion specialism’ to the fore. 

e How can senior managers give clear direc- 

tion in reconciling these competing pressures 

and ensure that a balance is maintained? 


Statement: Working in an electronic environ- 

ment isolates me from my colleagues. 

There was broad disagreement with this state- 
ment. The 12 per cent of library assistants who 
agreed with the statement were most likely undertak- 
ing backroom tasks such as inter-library loans rather 
than front-of-house tasks. 

Many examples of cross-library working groups 
were found, not mirrored in staffing structure diagrams 
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cessful than at operational levels where library staff 
complained, often in vitriolic terms, of the different 
ethos, lack of service ethic, they found among com- 
puting staff. There were problems of communication. 
Staff and library users sometimes had difficulty in 
knowing where to go for help, whether to the library 
or to the computing service. In some libraries it was 
noticeable that they had appointed computing offic- 
ers only recently and were immediately overloaded 
with work. Some computing officers had been ap- 
pointed from outside because of the lack of the 
desired response from the computing service. The 
need for close cooperation was acknowledged by all 
although, again, ‘territoriality’ could be detected, 
where both services were striving for control or 
where their territories overlapped, such as the Internet. 
e How close are we to appointing staff who 
will be expected to be equally qualified in IT 
and in librarianship? 
Statement: Electronic information sources 
could put me out of a job. 


۷ Could put me out of a 
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Figure 10. 


Many more people disagreed with this statement 
than agreed. Interestingly 14 per cent of library 
assistants did agree and 21 per cent of senior manag- 
ers were undecided. 

The 82 interviews conducted for the IMPEL Project 
are characterized by their enthusiasm and optimism 
for the future. Certainly academic libraries and 
librarians are under pressure. The changes they are 
dealing with are rapid and profound: ‘those changes 
are massive in that the whole basis on which many of 
them were trained has almost been pulled like a rug 
from underneath them.’ There are words of caution: 
“When it’s all coming down the wire to you, do you 
need a custodian?’ ‘I think we stand a good chance of 
being extinct....We might come back in about 40 
years’ time, when there’s a need for some critical 
evaluative faculty...” More typical, however, was: ‘It 
opens this big door and we’re there to help people 
through it.’ This is a painful transition stage. For this 
reason the positive attitudes and responses of IMPEL 
participants are encouraging. If a positive approach is 
to continue we need strategic management at the 
highest level with visionary leadership anda willingness 
to work in partnership with other professionals. 
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that they are indeed central figures in delivering 
information in an electronic environment. 

Information librarians noticed that where aca- 
demic staff and researchers had access to informa- 
tion at their desktops, the link between them was, to 
some extent, cut. The implications of supporting 
remote users had not been fully explored by informa- 
tion librarians. Conversely, information librarians in 
particular felt the need for closer liaison with aca- 
demic staff over journals and electronic sources, and 
over the needs of their students. 

There were varying levels of interest and expertize 
in electronic information, among academic staff and 
among students: ‘At least 70 per cent of the students 
here wouldn’t know a formatted floppy disk from a 
red bus.’ Library staff needed to make a quick assess- 
ment of a user’s computer awareness; the traditional 
reference interview became more acute. There was a 
shift in emphasis from being an intermediary to giving 
advice, help and instruction. There was a great deal of 
one-to-one instruction as well as group teaching. Elec- 
tronic information created a demand for instantaneous 
information. Many library staff reported a closer rela- 
tionship with students who often came back with 
further questions: ‘If you make something available 
which opens lots of doors to them, they'll come back 
asking “how can I open this other door?” 

eHow will the relationship with users be 

maintained at a distance when the range 

of support needed increases in complexity? 


Statement: (Library-trained staff) I have no 
difficulty working effectively with computer- 
trained staff. 

(Computer-trained staff) I have no difficulty 
working effectively with library-trained staff. 
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Roughly half of the respondents agreed with this 
statement although nearly 30 per cent of information 
librarians and senior managers disagreed. 

In none of the sites was there operational 
convergence between library and computing depart- 
ments although in two, both departments reported to 
one director. In all sites, both departments were 
working increasingly closer together. At strategic 
levels, this cooperation often appeared more suc- 


September 1995, Aslib Proceedings 


Impel project: the impact on people of electronic libraries 





project. British Journal of Academic Librarianship, 
8(3), 1993, pp.139-177. 


2. DAY, J., EDWARDS, C.E., WALTON, G. IMPEL 
— the impact on people of electronic libraries. 
In Enabling technologies for teaching and learning: 
national perspectives and futures. A Forum. Ed 
Augrey McCartan and Catherine Hare. Proceedings 
on the Forum on Enabling Tech-nologies for Teaching 
and Learning, The University of Northumbria at 
Newcastle 19-21 July 1994, pp.61-63. 
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Impacts of the electronic library on information staff 
in higher eduction: the IMPEL project. Ninth Annual 
Computers in Libraries '95. Proceedings, London 7- 
9 March 1995. Oxford: Learned Information, 1995, 
pp.141-144. 


IMPEL has collected a large amount of rich infor- 
mation on some of the ways in which academic 
libraries are managing change at a time of unprec- 
edented development in the very core of library 
service — access to information. The 82 interviews 
are in the process of more detailed analysis using 


qualitative data analysis software. The trends which . 


emerged were further discussed at a workshop in 
June with two representatives of each participating 
library. We now hope to test our findings in a wider 
range of institutions. 
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IMPEL Project 


The questionnaire relates to the effects of the increasing use of electronic (eg CD-ROM, 
networked information services) rather than print-based sources of information. 


JOD TINE SEES xeu dub EA سم‎ Length of time in information work ........... 


Next to each statement please circle the answer most closely matching your point of view: 
SA = strongly agree A = agree U = undecided D = disagree SD.= strongly disagree 


(n/a=not applicable) 
1. Use of electronic information sources increases my job 

satisfaction .............cscesecescecccssseseseess "ec SA U D SD na 
2. Use of electronic information sources reduces my work- 

16364 سس سی‎ E می شس‎ E ماس تی ار‎ IR ARE SA U D SD ma 
3. Electronic information sources could put me out of a 

JOD LC Sinai n Et Nt turn Ese DU tesa SA U D SD ma 
4. Electronic information sources make me more effective 

I Hy WOK مسارم‎ ERR o QUO سک‎ SA U D SD mna 
5. Working in an electronic environment isolates me from 

my colleagues ............................. —— — € SA U D SD ma 
6. Working in an electronic environment isolates me from 

Ihe USES SRE ماش‎ pd SA SD n/a 
7. I feel frustrated by my lack of technical expertise ........... SA U D SD n/a 
8. I am able easily to keep up with electronic develop- 

3268555 ضر سط مت‎ Ae cuan مت تی شی‎ SA U D SD ma 
9. For library-trained staff: 1 have no difficulty working — - 

effectively with computer-trained staff ............................ SA U D SD و‎ 

For computer-trained staff: 1 have no difficulty working 

effectively with library-trained staff................................. 


U D SD nma 
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10.1 feel confident using NEW sources of electronic 


information .............. EVENTAR EE RUE 4 20 9 7 
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Abstract 

In the 1980s the dominant framework of MT was essentially ‘rule-based’, e.g. the linguistics-based approaches of 
Ariane, METAL, Eurotra, etc.; or the knowledge-based approaches at Carnegie Mellon University and elsewhere. 

New approaches of the 1990s are based on large text corpora, the alignment of bilingual texts, the use of statistical 
methods and the use of parallel corpora for 'example-based' translation. The problems of building large 
monolingual and bilingual lexical databases and of generating good quality output have come to the fore. In the 

past most systems were intended to be general-purpose; now most are designed for specialized applications, e.g. 

restricted to controlled languages, to a sublanguage or to a specific domain, to a particular organization or to a 

particular user-type. In addition, the field is widening with research under way on speech translation, on systems 
for monolingual users not knowing target languages, on systems for multilingual generation directly from . 
structured databases, and in general for uses other than those traditionally associated with translation services. 


‘corpus-based’ methods. Firstly, a group from IBM 
published in 1989 the results of experiments on a 
system based purely on statistical methods. The ef- 
fectiveness of the method was aconsiderable surprise 
to many researchers and has inspired others to ex- 
periment with statistical methods of various kinds in 
subsequent years. Secondly, at the very same time 
certain Japanese groups began to publish prelimi- 
nary results using methods based on corpora of 
translation examples, i.e. using the approach now 
generally called 'example-based' translation. For both 
approaches the principal feature is that no syntactic 
or semantic rules are used in the analysis of texts or 
in the selection of lexical equivalents. 

This paper will concentrate on these new devel- 
opments in MT research. It will not describe any one 
project in detail, and projects are mentioned only as 
examples of trends — there are many others; for 
further details and for references to the systems 
mentioned see my recent fuller survey’. The paper 
will also say almost nothing about methods already 
well established by the end of the 1980s. Further- 
more, nothing will be said about theuse of commercial 
systems or the development of aids for translators. 
The subject is exclusively the development of new 
methods in MT research. Of course, many of the 
methods are still experimental and have not yet been 
tested on a large scale. Nevertheless, the trends are 
real; since 1989 MT has experienced a reorientation of 
its methodology sufficient to justify calling the 1990s 
a genuinely ‘new era’. 


Rule-based systems 

Before describing these new corpus-based develop- 
ments in detail I shall begin with rule-based 
approaches, since here too there have been important 
theoretical and methodological developments. 


Introduction 

At the end of the 1980s, machine translation entered 
a period of innovation in methodology which has 
changed the framework of research. 

What has changed? What was the situation in MT 
five years ago? Between 1975 and 1988 a large 
number of operational and commercial systems had 
appeared: Systran, Logos, Meteo, and in particular 
many Japanese systems. These systems were based 
in general either on the ‘direct’ approach to transla- 
tion, or on the method of syntactic transfer. They 
relied on bilingual dictionaries sufficient for the text 
domains in question; linguistic analysis was neither 
particularly deep or abstract, there was hardly any 
semantic analysis, and the use of non-linguistic 
knowledge was entirely absent. 

As for research, the dominant framework of MT 
research until the end of the 1980s was the approach 
based on essentially linguistic rules on various kinds: 
rules for morphological and syntactic analysis, lexi- 
cal rules, rules for lexical transfer, rules for syntactic 
generation, etc. Although the so-called ‘transfer’ 
systems dominated, e.g. Ariane, METAL, SUSY, 
Mu and Eurotra, there appeared in the later 1980s 
various ‘interlingual’ systems. Some were still es- 
sentially linguistics-oriented (DLT and Rosetta), but 
others adopted knowledge-based approaches, mak- 
ing use of non-linguistic information about the 
domains of texts to be translated. The most notable 
centre for this research has been Carnegie Mellon 
University. Nevertheless, these newer knowledge- 
based systems continued to be essentially rule-based 
systems, and in any case they remained somewhat of 
a novelty till almost the end of the decade. 

Since 1989 the predominantly rule-based frame- 
work has been broken by the emergence of new 
methods and strategies which are now loosely called 
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DLT project in Utrecht, based on Esperanto as 
interlingüa, and the Rosetta project at Philips which 
explored an isomorphic approach to constructing 
interlingual representations and the integration of 
Montague semantics. However, major interlingual 
projects continue to thrive, indeed with even more 
vigour, particularly in the knowledge-based approach 
at Carnegie Mellon University. The distinctive fea- 
tures are familiar: a neutral intermediary language 
for representing the meanings of texts (interlingua) 
and knowledge databases related to the domain of 
the texts to be translated. Several models have been 
developed over the years, and in 1992 was announced 
the beginning of a collaborative project with the 
Caterpillar company with the aim of creating a large- 
scale high-quality system for technical manuals in the 
specific domain of heavy earth-moving equipment. 
Other interlingual systems are, e.g. the ULTRA 
system at the New Mexico State University, and the 
UNITRAN system based on the linguistic theory of 
Principles and Parameters. There is also the Pangloss 
project, an interlingual system restricted to the vo- 
cabulary of mergers and acquisitions, a collaborative 
project involving experts from the universities of 
Southern California, New Mexico State and Carnegie 
Mellon. Pangloss is itself one of three MT projects 
supported by DARPA, the others being the IBM 
statistics-based project (see below) and a system 
being developed by Dragon Systems, a company 
which has been particularly successful in speech 
research but with no previous experience in MT. 
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Five or six years ago saw the end of two of the 
most significant transfer-based projects: the Ariane 
project at Grenoble University and the Eurotra project 
ofthe European Communities. These systems exem- 
plified typical features of the so-called ‘second-gen- 
eration’ systems: batch processing with post-editing 
and no interactive components, essentially syntax- 
oriented and stratificational with three stages of analy- 
sis, transfer and synthesis and the processes of analysis 
and generation passing through series of distinct 
levels (morphology, syntax and semantics), rela- 
tively abstract interface representations in the form 
of labelled trees, rules of transduction for changing 
trees from one level to another, and making little use 
of pragmatic and discourse information. 

Nevertheless, these projects do ‘live on’ to a 
certain extent in the Eurolang project based at SITE, 
a French company previously connected with the 
Ariane project. The project involves collaboration 
with the German company Siemens-Nixdorf and its 
METAL system and it is benefiting from experience 
with Eurotra. The first product of Eurolang is, how- 
ever, not an MT system as such but a translator’s 
workstation, the Optimizer. 

Other transfer-based systems continue in the 
present decade. There is, for example, the already 
mentioned commercial system METAL, and the 
major research at various IBM centres on the LMT 
(Logic programming MT) system. 

The beginning of this decade saw also the end of 
some rule-based ‘interlingual’ research systems: the 


Figure 1. Rules of formation and transformation (Eurotra) 
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ANALYSIS 


Grammar rules G1 -> Representation L1 


V € Transformation rules T1/2 


Grammar rules G2 -> Representation L2 


V. € Transformation rules T2/3 


Grammar rules G3 -> Representation L3 


TRANSFER 


Grammar rules Gn -> Representation Ln 


V €< Transformation rules Tn/n' 
Grammar rules Gn' > Representation Ln’ 


SYNTHESIS V 
Grammar rules G3' -> Representation L3' 
V € Transformation rules 2 
Grammar rules G2' -> Representation L2' 
V € Transformation rules 1 
Grammar rules Gl' > Representation LF 


Y 
target text 
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Since the mid 1980s there has emerged a widely 
accepted general framework for rule-based systems. 
It embraces all the formalisms which can be catego- 
rised as variants or equivalents of ‘unification’ and 
*constraint-based' formalisms. In essence, what these 
formalisms have in common is that the large set of 
rules devised only for application in very specific 
circumstances and to specific representations has 
been replaced by a restricted set of abstract rules and 
the incorporation of the conditions and constraints 
into specific lexical entries. For example (Fig.2), to 
translate English verb /ike into French plaire it is 
necessary to transform the syntactic structure: the 
English subject (John) becomes an indirect object in 
French, and the direct object (Mary) becomes the 
French subject. These conditions are to be found in 
the sets of morphological, syntactic and semantic 
features of the lexical entries of /ike and plaire. ۸ 
slightly more complex set of features is needed to 
indicate the constraints attached to the English word 
likely and its French equivalent probable. The Eng- 
lish word requires an infinitival complement, while 
the French word requires a subordinate clause. 


The *lexicalist? tendency 

A characteristic feature of rule-based systems is 
the transformation or mapping of labelled tree 
representations. For example (Fig.1), in Eurotra a 
series of tree transductions was proposed: from a 
morphological tree into a syntactic tree, from a 
syntactic tree into a semantic tree, from an interface 
tree of the source language into an equivalent target- 
language tree, and so forth. Transduction rules 
require the satisfaction of precise conditions: a tree 
must have a specific structure and contain particular 
lexical items or specific syntactic or semantic 
features. In addition, every tree is tested by 
formation rules; in effect, a ‘grammar’ confirms 
the acceptability of its structure and the relation- 
ships it represents. A tree is rejected if it does not 
conform to the grammatical rules of the level in 
question: morphological, syntactic, semantic, etc. 
Grammars and transduction rules specify the 
constraints which determine the possibility of 
transfer from one level to another and hence, in the 
end, the transfer of a source-language text to a 
target-language text. 


Figure 2. Constraint-based formalism (LFG) 


2 (a) 


John likes Mary €> Marie plait à Jean 


like, V: 


(TPRED) = like <SUBJ, OBJ> 


(t RED FN) = plaire <SUBJ, OBJ> 


(TPRED) = mary 
)۲1 PRED FN) = marie 


[PRED jean] ] 


(XT AOBJ OBJ) = (SUBJ) 


(TÎ SUBJ) = «(TOBJ) 
john, N: mary, N: 


(TPRED) = john 
(tTPRED FN) = jean 


F-structure of target language: 


PRED plaire 
SUBJ [PRED marie] 
AOBJ  [OBJ 


Student is likely to work €> Il est probable que l'étudiant travaillera 


probable, A: 

(TPRED) = probable <COMP>SUBJ 
(TSUBJ FORM) = il 

(TCOMP COMPL) = que 
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likely, A: 

(TPRED) = likely <XCOMP> SUBJ 
(TSUBJ) = (TXCOMP SUBJ) 
(tTPRED FN) = probable 
(tTCOMP) = «(TXCOMP) 


F-structure of target language: 


PRED probable 
SUBJ] [FORM il] 


PRED travailler 
COMP |COMPL que 
SUBJ  [..] 
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| SEM ARGI X 


| SEM ARGI Y 


these grammars is the simplification ofthe rules (and 
hence the computational processes) of analysis, trans- 
formation and generation. Instead of a series of 
complex multi-level representations there are mono- 
stratal representations or simple lexical transfer. At 
the same time, the components of these grammars 
are in principle reversible. It 15 no longer necessary to 
construct for the same language different grammars 
of analysis and generation: the same formalism and 
the same grammars can in theory be applied in both 
directions. 

Several groups have constructed general NLP 
systems based on unification and constraint-based 
grammars, which have been applied to translation 
tasks. The CLE (Core Language Engine) system, for 
example, has been used for automatic translation 
from Swedish into English and vice versa; the PLNLP 
(Programming Language for Natural Language 
Processing) system provided the foundation for trans- 
lation systems involving English, Portuguese, 
Chinese, Korean and Japanese: and the ELU engine 
(Environnement Linguistique d'Unification) devel- 
oped at Geneva in Switzerland has formed the basis 
for a bi-directional system for translating avalanche 
bulletins between French and German. 

The trend towards lexicalist approaches has had 
an important impact on the construction of lexicons. 
With the increase in the range of information at- 
tached to lexical units the lexicon is no longer 
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like (E1), 
role (E1, experiencer, X1), 
role (El, stimulus, Y1) 


gustar (E2) 
role (E2, stimulus, X2) 
role (E2, experiencer, Y2) 


monolingual lexical unit (English): 


ORTHO like 
[12] SEM El: | 
ARGO 1 
1ھ‎ 2 1 
ARG2 YI 


monolingual lexical unit (Spanish): 


ORTHO  gust- 
[13] SEM E2: | 
ARGO E2 
ARGI X2 
ARG2  Y2 


bilingual lexical entry for like-gustar: 


SPANISH [13] 


ENGLISH [12] 


Figure 3. 'Shake and bake' model 


The transformation rules themselves are now 
expressed as operations of rules of unification, which 
control the interaction of sets of features, the forma- 
tion of new sets and the elimination of illegitimate 
sets. As a result, the syntactic orientation which 
characterised many transfer systems in the past has 
been replaced by a trend towards lexicalist solutions. 
Many current research projects illustrate the ten- 
dency, including the UNITRAN system already 
mentioned. 

An extreme example of the ‘lexicalist’ approach 
is the method known as 'shake and bake'. There are 
no longer any structural representations, there are 
only sets of lexical representations (Fig.3). 

Translation proceeds through the identification 
of lexical items in the target language which satisfy 
the semantic constraints which have been attached to 
the equivalent lexical items in the source language. 
A translation 1s produced (or *baked") from interac- 
tions among the sets of features and the constraints 
attached to target language words. 

Unification grammar and constraint-based gram- 
mars originated some ten years ago. Today, 
unification is a central concept for a large number of 
linguistic theories, and constraint-based grammars 
and formalisms have attracted many MT research- 
ers: e.g. Lexical Functional Grammar, Definite Clause 
Grammar, Head-driven Phrase Structure Grammar, 
Categorial Grammar, etc. The main advantage of 
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gins in research of the 1980s or earlier, the emer- 
gence of a wide range of what may collectively be 
called ‘corpus-based’ approaches and methods rep- 
resents a new departure in MT research. It is these 
developments, above all, which justify the view that 
MT has entered a new era. 

The most dramatic development was the revival 
ofthe statistics-based approach to MT in the Candide 
project at IBM. The major feature is the use of 
stochastic methods as virtually the sole means of 
analysis and generation. The IBM research is based 
on the vast corpus of French and English texts con- 
tained in the reports of Canadian parliamentary 
debates (the Canadian Hansard). The essence of the 
method is first to align phrases, word groups and 
individual words of the parallel texts, and then to 
calculate the probabilities that any one word in a 
sentence of one language corresponds to a word or 
words in the translated sentence with which it 1s 
aligned in the other language. The most important 
point is that this is achieved without using any lin- 
guistic information. 

It will be seen (Fig.4) that the method has aligned 
proposal and proposition, now and maintenant, and 
implemented and the phrase mises en application. 
On the other hand, contrary to linguistic intuition, it 
has aligned will and seront, while the word be has 
not been aligned to any French word. On the basis 
of a large number of such English-French 
alignments the correspondence and probability 
frequencies are calculated. The English word not 
corresponds most often to two French words 
(fertility 2 having a probability of 0.758), and these 
two words are in general ne and pas (with probabili- 
ties of 0.469 and 0.460); other correspondences are 
less probable: non (0.024), pas du tout (0.003), etc. 
The method was evaluated by translation from 
English into French. 


concerned just with morphological and grammatical 
data of source language words and with indicating 
equivalent words or phrases in target languages. It 
includes now information on syntactic and semantic 
constraints and non-linguistic and conceptual infor- 
mation, albeit often limited to restricted subject 
domains. The expansion of data has been most clearly 
seen in the lexicons of interlingua-based systems 
with include large amounts of non-linguistic infor- 
mation, such as in the systems developed at Carnegie 
Mellon or in the UNITRAN system. 

In recent years interest has grown rapidly in 
addressing the problems of constructing lexicons for 
MT, and a number of workshops devoted to the 
question have been held. Lexicon building is a com- 
plex and expensive task ifthe lexicon is to be adequate 
and sufficient for real and practical applications in 
operational situations. Many MT research groups are 
investigating methods of acquiring lexical informa- 
tion from readily available lexicographic sources, 
such as bilingual dictionaries intended for language 
learners, specialized technical dictionaries, and the 
terminological databanks used by professional trans- 
lators. At the same time, research groups are 
collaborating more closely with each other in projects 
for the construction of lexicons for a wide range of 
natural language applications and different types of 
systems, not just for machine translation but also for 
text analysis and information retrieval. The best 
known collaborative project in the MT field is the 
EDR project (Electronic Dictionary Research) sup- 
ported by several Japanese computer manufacturing 
companies. 


Corpus-based systems 
While the new approaches, methods and projects 


described so far can all be regarded as natural 
progressions from developments having their ori- 


Figure 4. Stochastic MT (Candide, IBM) 


Alignment: 


The proposal will not now be implemented 


وت جک 


Les propositions ne seront pas mises en application maintenant 


Fertility Probability 
2 .758 
0 .133 
1 .106 





English: not 


French Probability 


pas .469 

ne .460 

non .024 

pas du tout .003 

faux .003 

plus .002 
etc. 
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French 


les principaux domaines 
les domaines suivantes 
ces deux domaines 

les domaines spécialisés 
activités paramédicales 
les champs magnétiques 
les bassins-houilliers 
les champs de blé 





French 


ont une influence directe à 
intéressent directement 
ont eu une répercussion directe sur 


a largement influencé 

s'est avérée positive dans 

X en auraient été gravement affectés 
influencera de façon déterminente 
aurait de fácheuses répercussions sur 





to those used by the IBM group) or by more tradi- 
tional rule-based morphological and syntactic 
methods of analysis. For example (Fig.5), if a trans- 
lation is being sought for the English word fields a 
databank might give the following possibilities in 
French: domaines, activités, champs. Each occur- 
rence is given in context. If there is an exact 
correspondence, e.g. coal fields > basins-houilliers, 
the selection process comes to an immediate end. 
But if there is no exact match, algorithms must be 
invoked to find the correct equivalent. 

For calculating matches, some MT groups use 
semantic methods, e.g. a semantic network or a 
hierarchy (thesaurus) of domain terms. Other groups 
use statistical information about lexical frequencies 
in the target language. The main advantage of the 
approach is that since the texts have been extracted 
from databanks of actual translations produced by 
professional translators there is an assurance that the 
results will be accurate and idiomatic. For example, 
one of the greatest difficulties of rule-based MT 
when working from French into English is the selec- 
tion of the correct equivalent of the preposition de; a 
databank offering a large number of examples could 
be of major assistance. And there are more complex 
problems where even greater help could be avail- 
able, e.g. the translation into French of the phrase 
have an effect on (Fig.6). 

At present, the example-based approach has been 
used most often to complement more traditional 
methods based on linguistic rules. However, there 
are some researchers who contend that the effective- 
ness of the approach can be fully tested only if it is 
used as the sole method of generating target text. 
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English 


the main fields 

the following fields 
these two fields 

the specialized fields 
the para-medical fields 
the magnetic fields 

the coal fields 

the corn fields 


Figure 5. Bank of example translations: field 


English 


have a direct effect on 

have a direct effect on 

have a direct effect on 

has had a marked effect on 

had a positive effect on 

had a highly negative effect on X 
will have a decisive effect on 
would have a detrimental effect on 


Figure 6. Example databank for have an effect on 


What surprised most researchers was that the 
results were so acceptable: almost half the phrases 
translated either matched exactly the translations in 
the corpus, or expressed the same sense in slightly 
different words, or offered other equally legitimate 
translations. Obviously, the researchers would like 
to improve these results, and the IBM group pro- 
poses to introduce more sophisticated statistical 
methods. But they also, rather surprisingly, intend to 
make use of some minimal linguistic information. 
Although they set out to disprove the traditional 
linguistic rule-based approaches, they are ready to 
experiment with any method which gives good re- 
sults — the IBM team are true empiricists! Some 
examples of what 1s proposed are: (a) the treatment 
of all morphological variants of a verb as a single 
word, and (b) the use of syntactic transformations 
(e.g. Has the store any eggs? > The store has any 
eggs QINV; John does not like turnips John likes 
do not Ml turnips) to bring the structure closer to 
that of the target language. 

The second major corpus-based approach ben- 
efiting likewise from improved rapid access to large 
databanks of text corpora is what is known as the 
‘example-based’ (or ‘memory-based’) approach. 
Underlying the approach is the basic notion that 
translation often involves the finding or recalling of 
analogous examples, the discovery or recollection of 
how a particular expression or some similar phrase 
has been translated before. The example-based ap- 
proach is founded on processes of extracting and 
selecting equivalent phrases or word groups from a 
databank of parallel bilingual texts, which have been 
aligned either by statistical methods (similar perhaps 
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output text in the target language was a largely ne- 
glected area of MT research. Now, major efforts are 
devoted to questions of stylistic improvement of out- 
put and to discourse features. 

Much of the impetus for this research has come 
from increasing attention to the need to provide 
natural language output from searches in databases. 
While most of this research concentrates on generat- 
ing text in a single language, some of it is devoted to 
multilingual generation. One of the first group to 
tackle this topic was, not surprisingly, a team based 
in Montreal long involved in MT. This group has 
worked on a system for producing marine forecasts 
in French and English, and on a system for generat- 
ing bilingual summaries of statistical data on the 
labour force. 

Another important trend ofthe last five years is the 
recognition of a demand for types of translations 
which have not previously been studied. In the past, 
systems were built generally for bilingual users, for 
translators and for those knowing both source and 
target languages. In addition, the texts translated had 
to be post-edited. The needs of those not knowing the 
target language were neglected. Businessmen engaged 
in foreign trade often need to communicate fairly 
simple standard messages in an unknown language 
(e.g. confirmation of an order, booking of accommo- 
dation, etc.) In recent years, groups have experimented 
with ‘dialogue-based MT’ systems where the text to 
be translated is composed in a collaborative process 
between man and machine (e.g. at UMIST, the Uni- 
versity of Brussels, Grenoble University and at the 
Science University of Malaysia.) In this way it 1s 
possible to construct a text which the system 1s known 
to be capable of translating without further reference 
to the author, which needs no revision and for which 
good quality output can be assured. 


Controlled language, domain-specific and user- 
specific systems 

In practice nearly all MT systems have been largely 
limited to restricted domains. Although originally 
designed as general-purpose systems, many of the 
well-established systems have been limited in opera- 
tion to particular ranges of subjects, since large 
dictionaries are needed and developers have concen- 
trated on domains where there 1s greatest demand. 
Indeed, some of the most successful implementa- 
tions of MT have been in environments, where the 
language of input is ‘controlled’ in some respect. 
Other systems have been specifically designed for 
particular subject areas (‘sublanguages’) or for the 
needs of specific users. In each case, they are efforts 
to overcome the known deficiencies of full-scale MT, 
in particular the difficulties of analysing complex 
sentences, of selecting correct target language equiva- 
lents and of generating idiomatic output. Consequently, 
the same systems may feature combinations of the 
three options: (a) control of input texts, (b) restriction 
to a sublanguage, and (c) design for a specific user. 
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A bank of bilingual parallel text can also be used 
more directly and immediately as a translation tool 
itself. In this respect, several groups have been de- 
veloping methods for the alignment of corpora of 
bilingual texts to provide easily accessible knowl- 
edge banks (or *translation memories") as.integral 
components of workstations for human translators, 
and indeed such a feature is already commercially 
available in the workstations from STAR and 
TRADOS. 

The availability of large corpora has encouraged 
experimentation in methods deriving from the com- 
putational modelling of cognition and perception, in 
particular research on parallel computation, neural 
networks or connectionism. A distinctive feature is 
the computation of the strengths of links between 
nodes of networks, and the adjustment of the 
weightings as a result of actual analyses, i.e. the 
network 'learns' about links and their strengths for 
later use. Furthermore, alternatives can be processed 
in parallel. In natural language processing 
connectionist models are ‘trained’ to recognise the 
strongest links between grammatical categories (in 
syntactic patterns) and between lexical items (in 
semantic networks). 

The potential relevance to MT is clear enough for 
both analysis and transfer operations, given the diffi- 
culties of formulating accurate grammatical and 
semantic rules in traditional approaches. As yet, 
however, within MT only a few groups have done 
some small-scale research in this framework, e.g. in 
the speech translation research at Carnegie Mellon 
University, in an example-based approach by McLean 
at UMIST, and in the Matsushita transfer-based pro- 
totype system. 

Connectionism offers the prospect of systems 
‘learning’ from past successes and failures. Previ- 
ously, learning has meant that systems suggest 
changes on the basis of statistics about corrections 
made by users, e.g. during post-editing. This approach 
is seen in the commercial Tovna system and in the 
experimental PECOF feedback mechanism in the Japa- 
nese MAPTRAN system. A similar mechanism has 
been incorporated in the NEC PIVOT system. 


Text generation 

The example-based approach has strengthened a trend 
which was already evident in the rule-based frame- 
work, namely the much greater attention paid to 
questions of generating good quality texts in target 
languages. Ten years ago it was commonly believed 
that the most difficult problems of MT concerned 
syntactic and semantic analysis, the disambiguation 
of homonyms, the resolution of structural ambiguity, 
and the identification ofthe antecedents of pronouns; 
in other words, the main problem area of MT was the 
understanding ofthe text to be translated. The thrust of 
research on linguistic rules and on knowledge bases 
reflected this concentration on problems of analysis. 
At this time, the problem of generating idiomatic 
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A new era 

Research on MT has passed through five eras to the 

present day. The first period began with the memoran- 

dum from Warren Weaver in 1949 which effectively 
launched MT research. The second began with the 

1954 demonstration of a simple system for translation 

from Russian to English, which encouraged govern- 

ment agencies in the US and elsewhere to support 
large-scale projects. This period was brought to an end 
by the notorious ALPAC report in 1966, which high- 
lighted the ‘failure’ of MT research to meet its promises. 

The third ‘quiet’ era, when MT was virtually ignored, 

lasted until about 1975, with a revival of interest in 

Canada, Europe and Japan. Whereas the systems ofthe 

first two eras were generally based on the ‘direct’ 

approach, the dominant framework after ALPAC was 
the various transfer and interlingual approaches based 
on linguistic rules. As described in this paper, there are 
now new methods and trends: approaches based on 
bilingual text corpora, statistical methods, example- 
based approaches, and new methods using unification 
and constraint-based grammars. These innovations 
have all appeared in the last five years and indicate the 
beginning of a new era for MT. If the direct method 
characterised the ‘first generation’ and the indirect 
methods of transfer and interlingua characterised the 

‘second generation’, what might be the basic features 

typifying the future ‘third generation’? 

The general view of many experts is that future 
systems will combine traditional rule-based methods 
and the newer statistics-based and example-based 
methods. They will be hybrid systems. But what 
kind? In one possible perspective, the linguistic meth- 
ods of the ‘indirect’ systems will provide the 
foundation upon which processes involving domain- 
specific knowledge banks, statistical data and 
examples of translated texts will operate. 

With respect to the base of linguistic rules it may 
be envisaged that in future hybrid systems: 

e tules will be less ambitious and complex than 
those of indirect systems 

e syntactic analysis will be limited to the recogni- 
tion of surface structures, phrase constituents and 
dependency relations 

e there will be almost no deep analysis of logical 
relations (quantification, scope of negation) 

e semantic analysis will be limited to the identifica- 
tion of roles: agent, instrument, etc. 

e lexical information will be extracted mainly from 
standard sources such as general-purpose diction- 
aries; consequently the lexicon will include only 
syntactic categories and perhaps crude semantic 
features 

e fairly simple semantic features will be used for 
initial disambiguation of input 

e rules of lexical and structural transfer will prob- 
ably apply to shallow representations (although 
not as crude as in the IBM Candide approach) 

e the formalisms will be those of unification and 
constraint-based grammars. 
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The control of the vocabulary and of the gram- 
matical structures of texts submitted for translation 
reduces the difficulties of constructing satisfactory 
lexicons of sufficient coverage, and the problems of 
ambiguity and selection of equivalents. Although the 
costs of preliminary editing may be high, post-editing 
is reduced considerably. The Xerox implementation 
of Systran and the many successful systems devel- 
oped by the Smart Corporation are probably the best 
known examples of controlled language MT. One of 
the largest controlled language projects currently is 
the CATALYST system under development for Cat- 
erpillar Corporation. Whereas controlled language 
has previously been used in systems of the ‘direct 
translation’ design, this will be the first application in 
a more advanced ‘interlingua’ system. 

The design of systems for a specific sublanguage is 
also notnew:the well known Meteo has beentranslating 
meteorological reports for 15 years. Among the 
sublanguage systems of recent years there are the 
CRITTER system for reports on the stock market 
under development in Montreal, the alreadymentioned 
projects at ELU, Pangloss, and the extremely ambi- 
tious projects for the development of spoken language 
translation. The Japanese ATR project has been under 
way already for seven years and will continue to the 
end of the century; it is a system for registration at 
international conferences and for hotel booking by 
telephone. ATR is also collaborating with Carnegie 
Mellon University and Karlsruhe University in the 
development of the JANUS speech translation system 
in the same domain. The European Verbmobil project 
is aiming to develop a transportable aid for face to face 
English-language commercial negotiations byGermans 
and Japanese who do not know English fluently. 

In the past, there were few systems built by users 
themselves. One example is PAHO (Pan American 
Health Organization), where two systems were devel- 
oped for translating from English into Spanish and 
from Spanish into English. In the last few years there 
have been several user-designed systems, typically 
with restricted vocabularies, for a particular domain 
and often based on a specific sublanguage. Some of 
these systems have been developed for software com- 
panies for clients. For example, Volmac Lingware 
Services has produced MT systems for a textile com- 
pany, an insurance company, and for translating aircraft 
maintenance manuals; Cap Gemini Innovation devel- 
oped TRADEX to translate military telex messages 
for the French Army; and in Japan, CSK developed its 
own ARGO system for translation in the area of 
finance and economics, and now offers it also to 
outside clients. Such user-designed systems are an 
encouraging sign that the computational methods of 
MT and NLP are now spreading more and more 
outside the limited circles of researchers. The systems 
may perhaps only rarely be innovative from a theoreti- 
cal or methodological point of view, but they are often 
very advanced computationally. It is a trend which 
could well expand rapidly in coming years. 
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Niftyserve and Minitel networks; in the United States, 
Systran is available via networks; and CompuServe 
has just announced an MT service for its users. These 
are new challenges to MT researchers. What kinds of 
systems are needed for these new services and de- 
mands? We may expect rapid changes in the field of 
MT in the near future and ultimately the appearance 
of new systems meeting more closely the actual 
needs of a wide variety of potential users. Fully 
automatic systems capable of producing idiomatic 
texts comparable to human translation are no longer 
the goal of MT research. It is now largely focused on 
the development of systems limited to sublanguages 
or to specific technical fields. 

In favourable conditions, limited-domain systems 
which are far from perfect can be and are being used 
successfully and cost-effectively. Of course, everyone 
wants to see improved quality, but it is not expected in 
the near future. The new approaches described in this 
paper have yet to be fully tested in experimental 
systems, and so it is unlikely that any commercial 
system based on any methods of the ‘third generation’ 
can be expected before the end of the century. 
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The corpus-based methods will act to refine and 
enhance the results and methods, perhaps as follows: 
e translation examples stored in aligned bilingual 
text banks will be used for more delicate 
disambiguation during source text analysis and for 
selection of target language equivalents 

e statistical information on lexical collocations and 
monolingual vocabulary frequencies will aid syn- 
tactic and semantic analysis of phrases, monolingual 
disambiguation, and selection of idiomatic target 
language phrases 

e data on probabilities of bilingual equivalences 
will be used during lexical transfer 

e domain-specific knowledge banks will aid mono- 
lingual and interlingual disambiguation 

e terminological databanks will be used to assist 
disambiguation of complex phrases and in the 
selection of target equivalents 

e feedback and connectionist methods will be em- 
ployed to improve grammars (and/or rule bases) 
and to enhance monolingual and bilingual lexicons 

e stylistic features and discourse information will 
improve output for specific needs and users. 

In addition, it can be assumed that many of the 
newer ‘hybrid’ systems of the third generation will 
be directly integrated in general computer-based 
systems for the production, transmission and man- 
agement of documents (i.e. more sophisticated 
workbenches for translators.) 


Use of systems 

The new research developments described in this 
paper are taking place against a background of a 
rapidly expanding marketplace for MT and increasing 
numbers of users. In recent years, the number of pages 
translated automatically has increased considerably — 
at present, more than a million pages annually, or 
about 300 million words a year?. The expansion has 
taken place in large multinational companies and in 
translation agencies, particularly for the translation of 
technical manuals. But there has also been an increase 
in the numbers of non-professional users. Many have 
purchased cheap PC-based systems, which are cer- 
` tainly crude in linguistic terms. The effectiveness and 
quality of the systems may be doubtful, but the needs 
of the users are undeniable. To a large extent, MT 
researchers have not taken up the challenge of design- 
ing systems for the non-professional ‘occasional’ 
translator. They have also been slow until recently to 
acknowledge the importance of standards and bench- 
marks for the evaluation and comparison of the 
performance, quality and efficiency of commercially 
available systems. 

We can predict an expansion of users of large- 
scale systems and of users of personal computer 
systems, and we can also predict an expanding of use 
of MT systems over electronic networks; in France 
and Japan, MT is already offered on the PC-VAN, 
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A series of short practical guides on how to deal with, and resolve, issues of current interest or concern to those 
working or teaching in the field of library and information services. Prepared by practitioners who are experts in 
the field, the guides offer a "how to' approach based on current good practice which has been proved to work. 


STRATEGIC PLANNING FOR LIBRARY 
AND INFORMATION SERVICES SHEILA CORRAL 


This guide applies strategic management concepis and 
techniques to library and information services It covers 
the key steps in the planning process from mission 
statements through to operational plans. e Introduction 
e Strategic planning in context e Purposes and benefits 
e Management of the planning process e Library 

planning and organizational strategy e Strategic 

planning and marketing strategy e Environmental 
analysis e Strategic focus e From strategy to action 
e Conclusion ~ documenting the process e Further reading. 
236x154mm; September 1994; vi, 50pp 0 85142 330 2 paperback 
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wory, - The Know How Guide Series Editor presents a quick way to 
oret" Û estimate and set charges for users of libraries and 
information services. e Management and planning implica- 
tions e Brainstorming e The mind map e Training needs 
e The competition e User consultation and input e intemal 
client surveys e Extemal client surveys e Costing and 
ED | pricing definitions e Assessing the potential e Service 
delivery e Payment e Liability e Feasibility testing 

e Manual of procedures e Enquiry record e Terms of business 
e Client communication procedures e Financial procedures and 
records e Credit management e Budgetary control e Staff 
e Marketing and promotion procedures e Conclusions e Case 
studies. 236x154mm; September 1994; vi, 50pp 085142 3396 paperback 


CD-ROMS: HOW TO SET UP YOUR 
WORK STATION 777 


From the strategic to the practical this Know How Guide 
covers the basic knowledge required to come to grips with a 
CD-ROM workstation. It provides in-depth information on 
installation, the accompanying software and trouble shoot- 
ing advice. No prior knowledge of installing CD-ROM 
drives is expected and the only the most rudimentary 
ASN) knowledge of computers is assumed. The more experi- 
enced will welcome the comprehensive detail provided. 

e What is CD-ROM? e Which computer should | use? e Which CD- 
ROM drive should | buy? e Installing the CD-ROM drive e Installing 
Extensions e Checklist of solutions e Seeking technical support 
e Choosing a CD-ROM publisher e Creating a menu e Networking 
drives e The advantage of networking e Hardware requirements 
e The future of CD-ROM technology e Sources for further 
information. 236x154mm; 1994; vi, 50pp; 085142 331 0 paperback 


MANAGING LIBRARY 
AUTOMATION ROSIN T HARBOUR 


An easy to understand step-by-step project based 
approach, from initial planning to final implementation, 
which provides a planning framework. e The reason why 
e The plane The investigatione Cost analysis e Estimating 
benefits e Investment appraisal e Carrying out the analysis 
e Hardware e Software e Planning methods e Estimating 
e implementatione Physical security 
e Operating procedurese Training . 
e System failures e Heallh and safety 
e Dataprotectione Copyrighte System 
access.236x154mm; September 1994; 
vi, 50pp; 085142336 1 paperback 
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7 PERFORMANCE MEASUREMENT 
IN LIBRARY AND 
INFORMATION SERVICES CHRISTINE ABBOTT 


A practical guide to the types of indicators to use and the 
role of performance measurement in LIS management. 
For all who wish to measure the performance of their 
service and are unsure of the best methods. 
e Introduction e Why measure performance? e The 
ED | political imperative e Accountability e Performance 
indicators e Decision support e Appraisal e Terminology 

e Prerequisites to performance measurement e The library as a 
system e Types of indicators e How to develop performance 
indicators e Involving staf e How to get started e Data collection 
and analysis e Presenting your findings e Conclusions e Refer- 
ences. 236x154; May 1994; vi, 58pp; 0 85142 329 9 paperback 


MOVING YOUR LIBRARY ANDREW McDONALD 


The key to success lies in careful planning and 
consideration of both the information and human aspects 
of the operation. The guide examines the essential 
elements and offers practical guidance to ensure mini- 
mal disruption to information provision. It also looks at 
the opportunity presented for service review and 
development. The librarian has a crucial role to play in 
the planning and management of change to ensure 
services recom mence as quickly and as efficiently 
as possible. e introduction e The planning process e The 
planning committee e Timing e Publicity and public relations 
e Resources e Building difficulties e Stock measurement and 
space allocation e Method of moving e Equipmant and furniture 
e Special items e Staff e Order of move e Who? e Human 
aspects e Security, safety and insurance e Advice e Conclusion. 
36x154mm; May 1994; vi, 36pp; 0 85142 328 0 paperback 


HOW TO MARKET YOUR LIBRARY 
SERVICE EFFECTIVELY 7 


This practical guide provides an A to Z for marketing your 
library or information service. Case studies show how the 
needs of librarians are being tackled. Written by an 
experienced librarian and marketeer. e Marketing myths 
e What is marketing? e Key elements of a marketing 
strategy e Customers e Competition e Market research 
e Positioning your library in the marketplace e Corporate 
identity e Setting objectives e The 4 Ps e Difference 
between services and products e Advertising e Posters 
e Newsletters e Brochures e Writer's checklist e Media relations 
e Broadcasting e Press releases e Open sessions. 
236x154mm; May 1994; vi, 44pp; 0 85142 334 5 paperback 
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۸ step-by-step guide to electronic mail. How to set up your 
entry into the fastest growing new communications 
network, the Internet, is explained. How to use and get the 
most from e-mail and other similar services. e introduction 
e Brief historye Getting 0082:1646 Hardware requirements 
e Services in the UK e Applelinke BT Telecom Gold/Diaicom 
e Mercury 7500 e Compuserve e CIX | GEIS e Other Intemet 
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Prologue 

Our ability to generate information and transport it about the planet on superhighways of optical fibre is about 
to change the way in which we communicate, work and live. There is not a single aspect of our future that will go 
untouched by the communication and computing revolution that is now upon us. The change we are about to 
witness will overshadow the impact of the printed word, industrial revolution, and physical transport. The next 
major wave of IT development will focus on the delivery of information and experience on demand, in the right 
form, at the right time, at the right price to fixed or mobile terminals anywhere, over networks of optical fibre, 
radio, satellite, and optical wireless. Bandwidth, distance and time will no longer be significant cost elements as 
service and access become dominant. 


The human condition Whilst there is an abundance of technology to 
From about the year 1600 onwards the human race address the fundamental problems associated with 
has been consuming raw materials at a compound an increased expectation of improved life style, it is 
rate ~7% per annum!. If the vast majority of these certainly not the case that everyone on the planet can 
resources were not renewable, at this rate of growth enjoy the energy consumption standards of the USA. 
the whole of planet earth would be consumed within The average American currently consumes an aver- 
the next 450 years, there would be no planets in the age of 8kW for heating, air conditioning, physical 
solar system within 550 years, and the sun itself transport, lighting and other services*. If mankind is 
would be consumed in only 650 years. This simple to live and prosper whilst maintaining the planet in a 
extrapolation of growing consumption, which will habitable state, then a much lower level of consump- 
increase as the third world industrializes, serves to tion is necessary. As the sun’s energy falling on earth 
illustrate that the prognosis of the Club of Rome in amounts to IKW/m^, and given the available area for 
1972 is ever more certain?. Exponential growth by solar collectors, and their electrical conversion effi- 
humanity based on accelerating raw material con- . ciency, we might assume a reasonable target to be 
sumption is clearly impossible — there are ‘limits to 1kW per human’. 

growth’. Within 100 years we can expect to see Physical travel constitutes a primary destruc- 
severe global difficulties due to the continued burn- tive, and perhaps largely unnecessary, activity with 
ing of hydrocarbons resulting in increased pollution almost 100bn passenger miles (costing over £15bn) 
levels and the denuding of raw material stocks’. consumed in the UK alone just to get to work’, 


Figure 1: Total UK costs in specific service/product categories 
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the technology, and should be able to participate. All 
the key technology elements to make this happen are 
to hand, and even the market drivers are engaged and 
established in this direction. 


The magic of IT 

Information technology (IT) is the only sector de- 
livering an exponential growth in capability whilst 
reducing raw material and energy costs. Since 1960 
our ability to transport information over any dis- 
tance has doubled each year whilst the use of raw 
materials and cost has reduced — Figs 3 and 4. 
Today, optical fibre transports over 90% of the UK 
telephone, fax and data communication. This new 
medium has an inherent capacity to transport the 
entire contents of over 100 human minds on just 1 
fibre at a rate faster than the fastest express train’. 
Similarly, the packing density of electronic cir- 
cuits, information storage and processing power 
has doubled every year with power consumption, 
raw material and cost falling exponentially’. We 
now enjoy a computing and communicating capa- 
bility that was unimaginable in the 1960s. In ten 
years we might expect to see computers 10? times 
more powerful than those of today. Within twenty 
years? the power could increase by 10° times, and 
in thirty years the power might have increased to 
10? times. Machines of such power and capability 
will evolve human characteristics of adaptability, 
intelligence and personality. They will also see 
computing and communications infrastructures ac- 
cessible by the entire population. IT will no longer 
be the preserve of an elite who have the opportu- 
nity, access and skill sets that are necessary to drive 
today's user unfriendly devices. Talking to the ma- 
chine, having hands in the screen and being able to 
see people and information in electronically gener- 
ated environments will become the norm. 


Much of this could be avoided today, and should 
be negated by information technology before the 
turn of the century. Not only is travel expensive in 
raw materials and energy, it also consumes vast 
amounts of time, with over £15bn per year in traf- 
fic jams for the UK — and £10bn of this in London 
alone! Similarly we might anticipate large gains in 
medicine, health care (£35bn), care, education 
(£25bn), training (£35bn), and entertainment 
(£31bn) in the UKS. The cost equation for each of 
these sectors is largely dependent on people, mate- 
rial, energy and transport. The application of new 
technologies is long overdue with the potential for 
substantial savings in raw materials, energy, time 
and productivity. 


Where is the money? 

The money devoted to the development a future 
information based society can only grow substan- 
tially at the expense of established industries and 
modes of operation. Looking at the distribution of 
wealth available we see four major target areas in 
Figs 1 and 2 that lie outside the established IT 
industry. Health, education, entertainment and physi- 
cal travel represent key opportunity segments, which 
are in turn augmented by publishing, shopping, sur- 
face mail, and other peripheral activities. 

From the history of earlier industries, it is clear 
we can expect dramatic reductions in human in- 
volvement in manufacturing and services, with 
subsequent cost savings. This change will be predi- 
cated by new generations of robots, materials and 
manufacturing processes, releasing human and fi- 
nancial resource for a new wave. of society. 
Interestingly, the financing of the information soci- 
ety looks to be relatively minimal when compared to 
all previous change since the printing press. In the 
information society everyone could have access to 


Figure 2: Total UK (government and consumer) expenditure on health, education, entertainment and travel 
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Figure 3: History and principal trends in cable telecommunications 


ter of the 1960s, whilst the personal computer is 
realising a processing, storage and display ability for 
the office and home that completely surpasses the 
mainframe computers of only 10 years ago. This is 
all characterized by an exponential growth in ability, 
and a corresponding fall in cost — exponentially more 
for exponentially less! 

The scale of change is perhaps best exemplified 
by the reduction in raw material usage. In the UK 
there is now an installed base of over 3M km of 
optical fibre supporting the communication needs of 
the 57M population. The entire fibre infrastructure 
was manufactured with just 90 tonnes of sand (silica) 





Cost reduction — capability growth 

In 1956 the cost of a transatlantic telephone call was 
£2.80/minute (Fig.4) — today it is £0.5/minute — 
computers in the home were unthinkable, and the 
storage and transport of information was almost 
wholly conducted by paper. Today, we have a rap- 
idly expanding global network of optical fibres 
already transporting 65% of all the telephone calls 
world-wide!'??, The first pocket calculator on the 
market in the early 1970s cost over £80 for just four 
functions — today superior technology is given away 
with petrol! À low cost electronic wristwatch now 
has more processing power than a mid-range compu- 


Figure 4: The falling cost of a transatlantic phone-call 
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tion, a combination of technophobia, natural inabil- 
ity and bad interface design has frozen out over 8096 
of the human race from using the technology. The 
move towards GUIs (graphical user interfaces), mouse 
based systems have seen the latent ability and de- 
mand beginning to be realized. Perhaps the most 
important next step will be the advent of the really 
friendly computer with voice I/O’. When augmented 
by artificial intelligence that is anticipatory, and able 
to fine tune responses in sympathy with the charac- 
teristics of the individual user, then we will have a 
really powerful and user orientated interface. 


Physical travel 

Why do we travel vast distances just to cluster to- 
gether to work in offices? The answer to this question 
is complex, but reasonably obvious — we come to- 
gether to communicate, interact and organize 
ourselves in a rather tribal and ritualistic way. With 
information technology this is no longer necessary nor 
relevant in the strict sense. Many already go home to 
do real work! The office has become an information 
exchange, an area of interaction, meetings and high 
chemistry. Solitude, isolation and concentration have 
to be sought in new places. Moreover, the chemistry of 
interaction can be achieved using an electronic me- 
dium so we face the prospect of increasing numbers of 
home, or dispersed, workers away from any central- 
ized office!5, This is not in the far future, itis happening 
now, and is evidenced by:the number of empty office 
blocks and buildings throughout the western world. It 
has been estimated that the empty office space across 
North America is equivalent to that occupied, or not, 
in San Francisco! Numerous companies are already 
moving to ‘hot desking’, with the provision of fewer 
desks than people as changing habits release people 
from being in the office every day. This is a trend that 
can be expected to increase, and accelerate in concert 
with IT. 


New capabilities 

It is evident that developments in artificial intelli- 
gence, visualization, virtual reality and telepresence 
will realize new capabilities. Human kind was never 
designed to cope with spreadsheets, the written word, 
keyboards and small screens.that only present a 
partial picture of a wider activity. Imagine a virtual 
reality interface with your visual cortex flooded by 
information from spectacle mounted or active con- 
tact lenses augmented by directional audio input, 
tactile gloves and prosthetic arms and fingers that 
give you the sensation of touch, resistance and weight. 
Imagine also the prospect of a surrogate head that is 
either machine or human that can allow you to be 
teleported into environments anywhere on the planet 
with great accuracy and reality". This might lead to 
the dictum: ‘what you see I see, what you hear I hear, 
what you feel I feel’! Alternatively, contemplate the 
convenience of large visual displays with high defi- 
nition in two or three dimensions. People could 
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compared to the thousands of tonnes of copper cable 
it replaced! Similarly, the latest desk top computers 
are being designed to use materials that are over 95% 
recycled. In both cases the performance and capabil- 
ity are vastly superior, power consumption 
increasingly minuscule, and production costs far less 
than previous technologies. Instead of using Watts of 
power for a telephone call we now use milli Watts. 


No frontiers — No barriers 
The digital revolution in computing and communica- 
tion has, so far, mainly impacted the office and place 
of work. Before the year 2000 it will have entered the 
home in the form of integrated entertainment and 
information systems. Information will become a com- 
modity, accessible across the planet at insignificant 
cost. This will be made possible by continued ad- 
vances in chip, satellite, radio and optical fibre 
technology that will also reach out to the home, car, 
and/or computer you wear. This information society 
will see the barriers between work play, home and 
office breached". The nature of commerce and society 
will change radically with no effective national and 
international boundaries. This could pose significant 
politica] problems with governments and regulators 
seeing control slip through their fingers — a bit, 1s a bit, 
isabit—thereis no difference between a telephone call, 
CATV, broadcast or data! Regulating the flow, distri- 
bution and access to information could be like trying to 
regulate the rain" — a futile occupation! 
Organizations themselves will become increas- 
ingly, and in some cases totally, dispersed. They will 
be virtual and organic with people contributing in an 
electronic rather than physical space. People will 
work when and with whom they choose, as appropri- 
ate, having access to machine intelligence and 
information. This will revolutionize the way busi- 
ness is conducted and economies are driven. Already 
we see those at the forefront establishing group 
environments where work packages are passed around 
the globe, like a baton, from one daylight zone to 
another. Programmes, projects, developments, crea- 
tivity and collaboration can then be non-stop, 
non-national, but virtual, fast and far more produc- 
tive and effective than today. 


Interfaces for people 

The realization of a global information network 
presents a major challenge. Its impact will span 
education, medical applications, leisure, entertain- 
ment, business and commerce through to shopping. 
The move to the information society presents sub- 
stantial technological and human interface problems 
for all IT related industries as the ideal is to deliver 
information on demand, in the right form, at the right 
time, at the right price to a fixed or mobile terminal 
anywhere. However, today's IT industry has a multi- 
plicity of hardware, software and interfaces, mainly 
designed to promote customer lock in, and it is clear 
that this presents an immediate challenge. In addi- 


224 


IT — a glimpse of the future 





` 


number of people involved in this industry can be 
expected to rise rapidly, and hopefully it will utilize 
much of the wasted human talent previously over- 
looked. All of this does not deny the continued 
existence of farming, the manufacture of clothing, 
hard and soft technologies — quite the reverse. All of 
these activities are required, but our expectation is to 
make them far more efficient and less intensive in 


human terms. A net result will be an increased per- 


centage of the population working in an information 
space that need not be location specific”. 


Information 

In Europe there are over 6 million photographs of 
church windows on record, and within five years we 
may have video on demand (VOD) systems offering 
a choice from 10,000 videos. The Library of Con- 
gress requires 3.5km of new book shelving every 
year to accommodate all the new publications. It 
has also been estimated that the total of mankind’s 
published material — and this is sometimes equated 
to knowledge itself — doubles every three years. It is 
also clear that a huge amount of information be- 
comes irrelevant, out of date and represents a largely 
meaningless clutter. 


appear in full proportion, with the right colour, a voice 
that emanates from the lips and not from a box at the 
side in a distortion free and convincing manner. All of 
these technologies lead to a feeling of being there! 


Past industry 

There is great concern over the decline of manufac- 
turing industry, and the changing work patterns 
already evident. In the short term the drive for greater 
efficiency, improved output, and migration from a 
manufacturing to an information society will involve 
some hardship and trauma. In the long term however, 
it has to be seen as part of a general migration that 
started a thousand years ago. Ever since we diversi- 
fied from hunting to farming we have been on a path 
that has seen 80% of the population involved in 
farming only 700 years ago reduced to less then 196 
today!*, The same is true for clothing production, the 


industrial and electronic revolutions. As each new - 


wave of technology has peaked and been made in- 
creasingly efficient it is being replaced by new and 
more beneficial alternatives. Today we are poised to 
see the emergence of information itself becoming a 
raw material - something we manufacture, prize, sell 
and perhaps most importantly of all, network. The 


Figure 5: Percentage of UK workforce employed by agricultural, manufacturing and service sectors 
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newspaper, magazines, books and databases, to dis- 
card large sections that are of no interest to us, we 
may have the option to pay more for less. The fo- 
cused news, articles, details, data and information 
would be far more beneficial. Barring serendipity 
that is! The future retailing, supply, updating, valida- 
tion, security, charging, copyright, and format of 
publications thus pose some major challenges. 


Technology — positive feedback 

Why is all of this so exciting and why does it offer 
such tremendous potential? Most positively, the hu- 
man race can achieve far more in a shorter time! The 
standard working lifetime of the previous generation 
was about 100,000 hours. We can now achieve their 
output in less than 10,000 hours or more impres- 
sively do 10 times less work and get 10 times the 
results! The next generation looks set to overtake us 
in a similar manner — provided we can keep pace with 
the technology. 


Telepresence — infomatics 

The developed world's population is getting older 
and it is highly unlikely that there will be the re- 
sources to provide the care that is necessary. In Japan 
programmes are underway to manufacture robots to 
take on the task. Other alternatives involve the 
teleportation of expertise, experience and presence 
itself. The technologies that allow surgeons to be 
positioned inside the human body through an 
endoscope or through the use of a surrogate head 
peering into an incision are already with us. The 
prospect of remote diagnosis, inspection and surgery 
is real, and initial experiments are underway. Before 
long we will see surgeons in California performing 
operations in London”. Robots are already being 
used in hip replacement, brain and eye surgery. The 
trip to a doctor's surgery or the hospital outpatients' 
department could soon become an automated and 
remote activity. Further developments include the 
remote monitoring of patients through electronic 
interfaces mounted on the body. For the diabetic, 
drug and medicine dependent people it is already 
possible for them to be monitored at a distance by 
remote computers that can administer and optimize 
dosage. So far experiments have been confined to 
hospital wards, but there is no reason why this cannot 
be realized globally”. 

All of these concepts can be extended to other 
disciplines including the repair and maintenance of 
oil rigs, electronic and power installations and even 
activities in the home. Being able to call experts, 
teleport them to your location, and then have them 
guide you through the necessary steps to effect a 
solution is only a short step away — and is already 
being tested. 


Remote education 


We might also anticipate that the very process of 
education will have to change. The vast majority of 
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What we need are technologies to help us navi- 
gate through the growing field of information, find 
what we want, access and manipulate data so we can 
get down to the kernel — decision and action! The 
necessary technologies are all under development 
with Artificial Intelligence (AJ) for navigation and 
location, plus Hebbian decay? mechanisms for fil- 
ing, and automatic text summarisation?!, However, 
there are still significant problems associated with 
the complexity and size of systems, databases and 
connectivity expected by the year 2000. At this 
juncture much of the display, software, hardware 
platforms and interfaces will be available to a wide 
proportion of the population?. 


Publishing — more for less 

In the 15th century the Vatican library had less than 
400 books and it was one of the biggest libraries on 
the planet. Today most of us own more books as 
individuals and the Library of Congress has an esti- 
mated 22 to 24M million volumes on 500km of 
shelves. To access and update an information space 
so vast is clearly impossible. We cannot afford the 
trees, the paper, the energy and, most of all, the sheer 
inaccessibility. Already we see CD-ROM technol- 
ogy delivering 650Mbytes of information per disc. 
Such a medium is capable of holding all the classics 
and almost all of the specialistbooks we could desire 
in a few tens of discs. Indeed, a staggering 2000 
classics have recently been published on one CD- 
ROM. Whole encyclopaedias, art galleries and 
museums — perhaps most importantly, complete with 
animation and interaction — are now possible and 
will become increasingly available. At the present 
rate of progress we should each have enough storage 
capacity at work and home to hold the contents of the 
Library of Congress within 15 years. But this is not 
the whole story — we only need access — we do not 
need copies of everything. The first book store on 
Internet has opened with 50 volumes selling at 5$ 
each. Atthis price our purchasing algorithm changes! 
Buy it and try it — who cares — it is so cheap I can 
afford to throw it away if I don't like it! 

To date well over 20,000 volumes are already in 
digital form and will sell at a fraction of their paper 
predecessors’ prices. Some publishing houses are 
already predicting that they see the end of paper 
publishing in sight. For technical and reference vol- 
umes this is credible. For the rest it might not be — 
paper is very user friendly! The extent of this differ- 
ence is embodied in the riddle: what is the difference 
between a laptop and a newspaper? 'The answer is: 
no one takes a laptop into the toilet! For general 
reading we need liquid crystal paper - high resolu- 
tion, definition, contrast, flexibility and compact. 
Then we might see novels and light reading trans- 
formed also — but then again there are alternatives 
such as talking books! 

. A further small advance could see custom infor- 
mation online. Instead of buying a complete 
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dependence on ancient modes of working. In the 
remaining years of their lives these people are likely 
to see more change for mankind than has been expe- 
rienced in the previous 100 years. A major challenge 
therefore will be the finessing of the technology to 
make it wholly acceptable to the greater part of the 
population. This will require some adept engineering 
to create new interfaces that are humanized and 
present a natural mode of immersion for the vast 
majority of the human population. If it is to work, the 
technology has to be available and accessible to all 
people of all ages. This probably represents the ma- 
jor challenge and is a vital one if we are to succeed. 


Limits to change 

All the technology we have briefly considered results 
in a reduced need to travel, a positive contribution to a 
greener planet, and a wider choice of experience for all 
concerned. In the information society the need to 
travel vast distances from home will be drastically 
reduced. New forms of short distance transport per- 
haps might even see the demise of the internal 
combustion engine and its deleterious effects on hu- 
manity and the environment. A further outcome is 
likely to be the restructuring of urban conurbations. 
As fewer people need to travel into cities and increas- 
ing numbers of office blocks become vacant there 
may even be a move to ruralize the environment, 
remove many of the buildings and return sites to their 
former state. The distributed society working in an 
information world will create new environments, new 
cities ofthe mind, new places to meet and work. Again 
prototypes are already in the research laboratories and 
every day new ideas and formats emerge. The rate of 
change is unlikely to be limited by the evolution rate 
of the technology, more the inability of mankind and 
society to subsume these advances and make use of 
them in a positive and economic way. 

The information society might just be the ulti- 
mate challenge, and opportunity, for humanity — we 
can opt out, but we cannot escape. We have to rise to 
the challenge, solve the problems, and access the 
power of information. 
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universities are small, with small departments in- 
creasingly stretched by a widening curriculum. Staff 
have to cope with larger numbers of students and 
teach a wider range of courses in a shorter time. Why 
then do we have fifteen lecturers giving the same 
lecture on different days to different groups of stu- 
dents? It is possible for all the students to attend any 
one of the lectures, or indeed, for the one lecture 
course to be prepared and delivered by a small team 
at one university. This would allow specialism by the 
departments, an increased efficiency and depth of 
understanding, a real opportunity to conduct mean- 
ingful research and perhaps most valuable of all an 
ability to allow students to mix and match modules 
and create their own degrees at a distance”. The 
distributed degree among five or six key universities 
would then be a real possibility! 

The nature of teaching and education can also be 
expected to see radical change. Since the ancient 
Greeks we have hardly strayed from scratching in the 
sand. Moving to the blackboard, to the white board 
and overhead projector is hardly revolutionary given 
the technology at our disposal. Might we expect 
experiments on the screen to become as respectable 
as experiments in the laboratory? After all, they are 
actually far more powerful and instructive! Online 
tutorials, lectures and interactive teaching packages 
for the rapidly expanding science and technology 
based curriculum would seem a necessity. Packages 
are already being introduced in medicine and other 
professions. The dismantling of high tech structures; 
simulation of air flow across an aircraft wing; current 
flow in an electronic circuit; or the dissection of a 
frog or human organ are already available on trial 
systems. In some universities it is already impossible 
to get a degree qualification without your own per- 
sonal computer. 

Perhaps in the not too distant future we will be 
able to cruise the world's institutions, virtual and 
real, and drop in for a refresher course presented by 
an internationally recognized expert — anywhere, 
anytime! Perhaps project reports will become active, 
and interactive documents with high quality 
visualizations which offer immediately informative 
and accessible representations of physical or other 
situations. Most radical of all, mathematics and the 
physical sciences may be opened up to all. Those 
who have found the traditional long haul of 15 years 
of education, required to get even a rudimentary 
understanding, too tough, difficult, or plain indigest- 
ible, might find that visualisation and/or virtual reality 
puts them in the picture. 


Our children 

Already we see our children exhibiting tremendous 
willingness and ability to move into this new world 
of information — they do not present the problem — 
we do! The challenge has to be the rigid mind sets of 
the over thirties who will have to be weaned off the 
motor car, physical travel, the mass use of paper and 
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features, this article takes a specific focus, and that 
focus is in relation to the publishing of bibliographic 
and reference information. 


Costs, Pricing and Value 

The price that the market is prepared to pay for 
information is related to its perceived value. Suc- 
cessful marketing may modify the perceived value of 
certain types of information or documents, in par- 
ticular, popular fiction or biographical works, but the 
perceived value of bibliographical and reference 
works and therefore the price that the consumer is 
prepared to pay are determined to a greater extent by 
the price of competitive products or the price that the 
market 15 accustomed to paying. So, when publishers 
are considering the marketing of electronic reference 
or bibliographical works the starting point for pricing 
policies must be the price of earlier printed products. If 
the electronic product offers added value, because the 
information can be accessed and used more easily and 
flexibly, or because the electronic product is, say, 
multimedia, then the publisher may be able to price 
the electronic product more ambitiously. On the other 
hand, publishers may seek to reach a larger market by 
choosing to price these products more competitively 
than the print equivalent. 

Pricing in conjunction with sales volume, deter- 
mines revenue. Revenue must exceed expenditure if 
a publisher is to remain in business. However, it is 
important to recognize that it is the totality of the 
publishers's activities that need to remain viable, or 
in some cases, depending upon organizational struc- 
ture, a specific list or division, such as reference or 
children's literature division. Furthermore, all pub- 
lishers would seek to remain in profit in each year, 
some can weather a bad year in the interests of long 
term viability, stability and preferably growth. 

Olaisen (1992) in presenting a review of the 
various methods that can be used as the basis for 
pricing explores some of the different approaches 
that publishers can take to the management of the 
relationship between pricing, costs and value. Some 
of the key approaches are: 


Introduction 

There is currently a general air of euphoria concern- 
ing the potential for electronic publishing on 
CD-ROM and over the Internet. Yet many publishers 
are proceeding cautiously and are supplementing 
their print products with CD-ROM or online full text 
products. Most publishers view the future for elec- 
tronic publishing as uncertain, not because they can 
not see the potential of the technology, but because 
they are unsure of the extent of acceptance of elec- 
tronic information products in the marketplace. 

The potential market size for a range of electronic 
information products is unknown. The marketplace 
is currently very fluid leading to publishers adopting 
complex pricing strategies in an attempt to secure 
adequate returns (eg Rowley (1993)). Brindley (1993) 
asserts: 

"No one should think that the area of pricing, 

particularly of newer forms of information 

products and services, is stable or that there 
are simple guidelines. Product pricing is fluid, 
changing and rather uncertain as technology 

is changing the relative economics of tradi- 

tional forms of publications against optical 

media, and of online databases versus 

CD-ROMs, and so on. The market is in flux 

and only a limited number of significant inter- 

national players are yet making profits from 
these services.' 

Publishers are in business to make a profit. Eco- 
nomic viability of their product range is therefore 
crucial. Economic viability depends upon revenue 
exceeding costs by a sufficiently: comfortable mar- 
gin. Other articles by the authors have considered 
some of the pricing strategies for electronic informa- 
tion products (Butcher and Rowley (1994)). This 
article focuses on the costs of producing print and 
electronic information and documents as a basis for 
considering the viability of electronic information. 
We seek to identify the cost components in publish- 
ing taking into account the complementary nature of 
print publishing and those in electronic publishing. 
Since each area of publishing has its own unique 
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namic and forward looking publisher with an excit- 
ing portfolio of products, and may therefore have an 
unquantifiable effect on long term revenues from 
other products. 

In practice these combinations of pricing policies 
have led to complex pricing structures which differ 
depending on the channel of distribution. Electronic 
information is currently characterized by a mixture 
of pricing strategies which include: 

a) subscriptions 

b) one-off payments 

c) pay-as-you-go access based on access. 


Factors affecting the costs associated with 
bibliographic and reference publishing 

When assessing the viability of new publishing ven- 
tures it is easy to be satisfied with the marginal 
costing approach described above. Such an approach 
assumes that the key costs under consideration are 
those directly associated with the creation of the 
specific product. In general the situation is not as 
simple as this, with overhead costs of various types 
that also need to be covered. Costs associated with 
the production of electronic and print products can 
be viewed as falling into three categories: 

a) database costs 

b) distribution media costs 

c) overhead costs 
We discuss each of these costs in turn. 

Database creation is often common to both the 
printed product and any equivalent electronic prod- 
ucts. The costs associated with database creation, 
editing and processing tend to be fixed costs that 
depend upon the size of the database. Database crea- 
tion costs are a major cost in the production of 
reference books and bibliographical databases. Cur- 
rent bibliographical sources such as the British 
National Bibliography or Books in Print or other 
major reference works, require a large database which 
is consistently updated by the editorial team. Once 
established, this database may be used for several 
print or CD-ROM titles, thus spreading the fixed 
costs over a product range. The number of versions 
of dictionaries produced by such publishers as Collins, 
Longman or Oxford University Press, all derived 
from a single database, are prime examples of ways 
of reducing the fixed costs for a specific title. 

Output costs are different for the different distri- 
bution media. For print the cost depends partly on the 
length of the document, and whether, for example, in 
a bibliographical database, cumulations are printed. 
The creation of a CD-ROM incurs costs associated 
with the creation of the master and then per copy 
costs for each copy. These costs, for most biblio- 
graphic and reference databases, tend to be fixed 
unless the database cannot be accommodated on one 
CD-ROM. Whereas print costs rise with the number 
of volumes, CD-ROM costs remain the same. In 
principle, CD-ROM costs are lower per copy for a 
long run than many print sources ifthe publisher can 
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. Optimal pricing, where a substantial profit 15 made, 
and this occurs at the crosspoint between marginal 
revenue and marginal cost. This approach is used 
by online hosts when they charge the highest 
possible average price during peak hours, with 
rebates during off-peak-time. 

2. Pricing according to value, allows price discrimi- 
nation by which groups of users are charged prices 
that they are willing to pay. Online hosts and CD- 
ROM publishers often offer special discounts to 
educational institutions. Segmentation may be by 
type of user (employees, students, consultants), by 
type of application (public/private databases, bib- 
liographic/full-text databases/numeric data, or by 
time (day/term or vacation). 

3. Pricing for full cost recovery is where all costs are 
recovered. To guarantee full cost recovery, prices 
must be completely inelastic to cater for drops in 
demand, and are normally based on what the pub- 
lisher hopes will be a conservative estimate of 
total sales. 

4. Marginal cost pricing is adopted either when sub- 
sidies have been agreed or to encourage maximum 
use of capital intensive facilities. The main limita- 
tion of this approach, which is often used where 
public funding supports an element of the cost, 5 
that it can encourage a monopoly which is unre- 
sponsive to new market demands. 

5. Free distribution of services is possible where a 
full subsidy is provided. Here costs are typically 
paid through taxation or other central government 
mechanisms. This has been the traditional ap- 
proach where print-based services are accessible 

. free of direct charge through libraries. In most 
situations the added value of electronic informa- 
tion leads to providers being reluctant to offer free 
distribution, though some European Communities 
online databases have no direct user charges, for 
instance. The Internet is the prime vehicle for 
information available free to users at present (apart 
from telecommunications costs). 

Many producers may use a range of these strate- 
gies within the same organization, and certainly 
within the electronic information marketplace some 
use of all of these strategies is evident. In particular 
publishers may be more concerned with the viability 
of the portfolio of products than with the viability of 
the individual product. Thus, when considering the 
introduction of electronic products it is important 
that the revenue from these products must cover: 

a) costs of producing such products 

b) any lost revenue from equivalent printed 

products. 

However, the situation is not quite that simple. 
Each ínvolvement in electronic publishing may be 
viewed as a necessary investment for the future and 
development costs may be ‘written-off over an 
extended period of time. Furthermore some repre- 
sentation in the electronic information marketplace 
may be important in conveying the image of a dy- 
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by monthly, quarterly and annually in print as a 
record of books published each year. Their database 
is also used to provide records of forthcoming books 
to the library suppliers and to the British Library for 
inclusion in the British National Bibliography. Ret- 
rospective records can be sold to libraries and 
bookshops who are automating their records. This 
example illustrates how the cost of database creation 
can be spread across a range of products and serv- 
ices, with the publisher maximizing the use of the 
data. Attribution of proportions of cost to any one ` 
product are clearly different, and the publisher will 
be concerned about the profitability of their activi- 
ties overall. 


Conclusions 

This article has explored the relationship between 
costs and price in the publishing of electronic docu- 
ments, with a view to identifying the cost factors that 
may influence the viability of electronic documents, 
in the context of reference and bibliographic 
databases. It is important to recognize that it is not 
sufficient to seek to identify whether print or elec- 
tronic documents are the cheaper to produce. In 
assessing the viability of electronic publishing it is 
important to consider all costs and to view the 
electronic product in its context as part of a pub- 
lishing portfolio. Further research into the 
relationships between the relative significance of 
each of the cost components identified would con- 
tribute to the assessment of the likely success of 
electronic publishing. 
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get the high sales volume. Costs per copy may be, for 
example, 50 pence for the disc and case, and 25-30 
pence for labelling. On an edition of 10,000+ the 
costs per copy, even with marketing costs, is about 
£1.80. The relative economies of print v CD-ROM, 


then, depend to a considerable extent upon the physi- 


cal reproduction costs, However, the question of 
which is the cheaper medium, print or CD-ROM, is 
not the same as whether CD-ROM publishing is a 
viable proposition. In order to explore this issue it is 
important also to identify overhead costs. 

Overhead costs associated with running a pub- 
lishing operation can be allocated to two activities: 
commissioning and editing, and marketing. Increas- 
ingly editorial work for reference and bibliographic 
databases, together with production and design, is 
sub-contracted. The only permanent editorial staff 
employed by the publishing companies are the Com- 
missioning editors who commission new titles and 
coordinate the sub-contracted editorial, production 
and design work. 

Marketing staff also have a role that overarches 
entire lists or programme areas. They have responsi- 
bility for maintaining contact with the market, 
particularly contact with the major chains of book- 
shops and library suppliers. Some marketing may be 
applicable to a complete list, whereas other promo- 
tion efforts may be focused on specific titles. Thus it 
may be possible to allocate some marketing costs to 
specific titles, whereas other costs may be viewed as 
more generic. There is a complex relationship be- 
tween marketing costs, sales, revenue and thereby 
profit. 

The successful publisher 1s able to spread fixed 
costs, production costs and overheads across a range 
of products and services and publications in different 
formats. The bibliographical publisher, J Whittaker 
& Sons has a database which is vital to the book trade 
and libraries. It is the prime source for information 
on book availability, new titles, price changes, etc in 
the UK. From the database, Whittaker produces the 
in-print listing Whittaker's Books in Print in printed 
form (annual), microfiche, CD-ROM (both monthly) 
and an online database accessible via Blaiseline and 
Dialog. They maintain retrospective records of books 
. which have gone out of print in microfiche, CD- 
ROM and online formats, and issue lists of newly 
published titles weekly in The Bookseller, followed 
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Consider for a moment the complex process of mov- 
ing one's personal effects from London to New 
York. Normally one would sign a single car-iage 
agreement specifying that goods should be t-ans- 
ported to a specific address. In fact, the goods will 
probably be picked up by a removal company owned 
by one company, loaded onto a lorry owned by a 
second, driven over roads owned by the piblic, 
placed in a warehouse owned by yet another zom- 
pany, placed into a shipping container owned by 
another, loaded onto a ship owned by another, etc. 
Furthermore, most of these persons will have little 
knowledge of what cargo resides in the 'container' 
which is used. Neither will they know the ultimate 
source or destination of the cargo, being concerned 
primarily with performing their unique role 11 the 
carriage process. 

The numerous legal relationships arising im this 
process can be illustrated by a few simple questions. 
What can the shipper do if goods are damaged Ly the 
removal men, or by a broken axel on the lorry, by 
holes in the road, a leak in the warehouse roof. or if 
the ship is lost at sea? For that matter, who will be 
liable if contained within the cargo 8۲65 ۵ 
items which may not be exported without licence or 
whose import into the US is prohibited? The answers 
to some of these questions will seem obvious to 
many, but it should be borne in mind that these 
answers are the product of hundreds of years of legal 
experience dealing with the process of shipping tan- 
gible goods. 

The basic process of analysing Internet law, how- . 
ever, is little different. First one has to focus upon 
actors, actions, and legally significant relationships. 
Saying that an event has taken place ‘on the Internet’ 
does not advance the analysis at all. In fact, some 
person will have directed information to be transmit- 
ted or made information available for retrieval The 
information will have been handled by a variety of 
service providers who avail themselves of physical 
infrastructure leased from those who own narional 
networks of copper cable. As with the shipping ex- 
ample, numerous persons will be involved in shipping 
the information 'cargo' from one point to another. 

As with all metaphors, however, this one breaks 
down in a few places. First, the speed of delivzry is 
very much faster for the Internet than for targible 


Centre, London. 


Introduction 
Legally speaking, there is no such ‘thing’ as the 
Internet. 

If one ever doubted that the law is an ass, surely 
this statement must provide ample evidence. Who 
among us has not heard of the world’s largest com- 
puter network and arguably pre-cursor to the global 
information infrastructure? Some among us may even 
have experienced Internet email or browsed the World 
Wide Web. How then can the law be so blind to the 
very existence of that which so many of us take for 
granted? 

As others today will no doubt explain in detail, 
the Internet as we know it is not really a single 
computer network nor is it an information service. 
Rather it is a series of networks each owned and 
operated by different people which carry content 
generated by millions of users — those who are 
‘plugged in’. Each of these persons plays some small 
role in making the Internet what it is and each oper- 
ates with little detailed knowledge of or control over 
the actions of the others. The Internet can be concep- 
tualized as the sum of all of these disparate parts. 

The fact that no single person or organization 
owns or controls the Internet produces some interest- 
ing legal results. The most obvious result is that if 
one suffers a wrong due to something which has 
happened ‘on the Internet’ (a meaningless phrase as 
we shall see below) 1s that there is no single person 
against whom one can bring legal action for redress. 
There is no 'Internet plc' one can sue for damages or 
enjoin from further action. 


The cargo shipping metaphor 

This basic legal structure presents numerous chal- 
lenges to lawyers and non-lawyers alike in 
determining how existing laws apply to the use and 
operation of the Internet. Nonetheless, I believe that 
a well-understood historical precedent is available in 
helping us to understand the workings of 'Internet 
law’. I refer to the business of containerized ship- 
ping. 

As with the Internet, there is really no such thing 
as the "International Tangible Goods Shipping Net- 
work’, There are instead merely a collection of persons 
who own delivery vans, warehouses, shipping con- 
tainers, ships, cranes, hoists, roads, bridges, etc. 
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driven in large part by the software industry in an 
effort to reduce incidents of 'piracy' and resulting 
lost revenues. In response most countries of the 
world have amended their copyright laws to make 
clear that works stored in electronic media are sub- 
jectto protection. To take a simple example, it is now 
beyond doubt that electronically mailing the text of a 
newspaper article to a list of one's friends will impli- 
cate copyright law and will most likely be an 
infringing act. One of the more interesting questions 
presented by the Internet, however, is the problem of 
*virtual copying'. This is illustrated by the 1993 
Frena case. 

In the US case of Playboy Enterprises v Frena, 
Frena was the owner and operator of a computer 
bulletin board service. This bulletin board was not 
connected to the Internet, per se, but bulletin board 
cases struggle with many of the same issues pre- 
sented by the Internet and, legally speaking, it is 
difficult to draw any principled distinction between a 
computer bulletin board accessible via modem and a 
file server (such as a World Wide Web home page) 
accessible via the Internet. Frena allowed a number 
of people to access his computer bulletin board via 
modems and the bulletin board contained a vast 
repository of computer files which could be retrieved 
by any of his customers. Not all of these files were 
placed onto the computer by Frena, for he allowed 
each of his customers to ‘upload’ their own files to 
the bulletin board as well. Playboy Enterprises dis- 
covered that a number of files situated on Frena's 
computer and accessible by his customers were digi- 
tized graphic files containing copyright photographs 
from Playboy publications. Playboy sued Frena for 
copyright infringement. 

Frena asked the court to dismiss Playboy's claim 
on the grounds that he had not placed the copyright 
files onto his computer. Rather, he claimed, some of 
his less scrupulous customers had placed them there 
by ‘uploading’ them via the phone line. Frena lost. 
The court explained, quite rightly, that so-called 
‘primary’ copyright infringement is a strict liability 
offence. It 15 no defence to the charge to say that one 
does not realize that the works in question were not 
protected. (This is equally true under UK law.) 

What the court failed to address in its decision, 
possibly because his lawyer did not present the 
argument, was the possibility that had not copied 
anything at all. The court seemed to assume that 
simply because Frena's computer had created in- 
fringing copies that Frena had made the copies 
himself. The court either ignored, or was not made 
aware of, the argument that Frena's customers had 
made the copies in question by issuing appropriate 
instructions to Frena's computer. In other words, it 
could be argued that Frena was operating in a simi- 
lar capacity to one who owns a photocopying shop. 
Simply using another's copier should not result in 
primary infringement liability to the owner of the 
copier. Such cases are normally tried under the 
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goods shipment. Shipping experts speak of delivery 
times in terms of hours, days or weeks while Internet 
delivery times are expressed in seconds, minutes or 
occasionally hours. Second, it must always be borne 
in mind that Internet *cargo' is an intangible. Stated 
differently, in a shipping network their is only one 
'cargo' which can be physically located at any given 
time while with the Internet the item delivered is 
actually a copy and there is no need for destruction of 
the original. Nonetheless, I believe that the metaphor 
is useful in helping to understand the application of 
existing law to the Internet. 

Having identified the various actors and actions, 
one then analyses the application of existing laws to 
the result. This looks simple at first glance, but can 
produce some interesting (and counter-intuitive) re- 
sults. One recurring issue in the area of Internet 
liability is the possibility for ‘virtual action’. By 
virtual action I mean the process of remotely in- 
structing a computer to carry out activities on one's 
behalf. This theme will be explored in a bit more 
detail when discussing the Frena case. 


Existing Internet regulation 

For the reasons spelled out above, I believe that it is 
a fallacy to speak of the Internet as being ‘unregu- 
lated’. On the contrary, use of the Internet may be 
one of the most regulated activities in the history of 
mankind since it implicates so many different fields 
ofendeavour simultaneously. For some reason, how- 
ever, there seems to be a widespread myth that laws 
do not apply to 'new' technology. This myth is 
probably fed by the occasional report of a person 
‘escaping legal liability because an older law fails to 
operate when applied in a specific situation, but it 
should be stressed that such examples are the excep- 
tion rather than the rule. To take an example, fraud is 
still fraud whether perpetrated face-to-face, via a 
telephone call, or by electronic mail. Furthermore, 
the constituent technologies which make up the 
Internet have existed in some form for quite some 
time: decades in the case of telecommunications 
networks and for more than twenty years in the case 
of computer hardware and software, and storage of 
information by electronic means. Invariably there 
will be circumstances where laws will apply to the 
Internet counter-intuitively, but nonetheless they will 
apply. Users proceed at their peril since ignorance of 
the law is not an excuse. 

In this section of the paper, I will review some of 
the legal issues which arise in respect ofthe use ofthe 
Internet and how some of these issues have been 
addressed in recent court cases. 


Copyright 

Copyright law has had to adjust, perhaps more than 
any other, to the advent of tlie information age. As 
computer technology became ubiquitous in the 1970s 
and 80s, the law of intellectual property struggled to 
keep up. Changes, where they were needed, were 
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the same name and very early in the history o4 the 
Internet the decision was taken to delegate machine 
naming responsibility to every person whose net- 
work is connected. In order to avoid conflicting 
names, each user registers a unique domain rame 
such as ‘cliffordchance.com’, or ‘aslib.co.uk’. In 
each case the registrant has asked for a unique iden- 
tifier that will drive a key part of its identity on the 
Internet. Às an example, the machine hosting the 
Clifford Chance home page is named *www.cli^ford 
chance.com' and the machine handling Internet 
email for Clifford Chance is known simply as 
‘cliffordchance.com’. This has important everyday 
results including clear identification of the 
organization involved. 

Some recent incidents have highlighted the prob- 
lems of domain names as they start to intersect with 
long-standing usage of trade marks. Most farnous 
was the incident where a magazine journalist rzgis- 
tered the domain name 'mcdonalds.com' and 
established his email address as ‘Ronald@mcdoralds. 
com’. Not surprisingly, the proprietor of the gclden 
arches was not pleased by this development. They 
were even less pleased when they learned that the 
appropriate Internet domain name registration 
organization operates on a first-come, first-rzgis- 
tered basis without regard to existing intellectual 
property rights. The situation was eventuall- re- 
solved amicably. 

The incident points up another difficulty, 10w- 
ever. While exact legal standards vary from coantry 
to country, the touchstone of trademark protection 
and liability relates to the identification, or mis- 
identification, ofa type of goods or services. In many 
cases different people ‘own’ a mark in relation to 
different types of products since there is no I keli- 
hood of confusion. For example, an electronics shop 
called ‘McDonalds computers’ could advertise its 
stock without fear of reprisal from the fast food 
company since they distribute different types of goods 
and services. Only one of them, though, can register 
the domain name *mcdonalds.com'. Note, however, 
that if another fast food restaurant were to register 
and use the mcdonalds.com domain name in a man- 
ner likely to promote consumer confusion, there 
would be a number of legal remedies potenczially 
available to McDonalds. Eventually, a court with 
appropriate jurisdiction might order the transgr2ssor 
to desist from using the domain name in dispuce. 

Another trademark incident which is proceeding 
currently as a court case in the US is MTV Networks 
v Curry. Adam Curry was a presenter on the interna- 
tionally known M-TV music video television 
network. While an employee, he registered thz do- 
main name ‘mtv.com’ and began accepting limited 
email as part of occasional Internet promotions. Curry 
and the network came to a parting of the ways and 
now they are involved in a dispute over whc will 
have rights to use the domain name. While the case 
has not yet been decided, a pre-trial ruling by the 
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rubric of ‘secondary infringement’, or aiding and 
abetting infringement. In secondary infringement 
cases it is an excuse to claim no knowledge of the 
copying activity. 

The Frena case illustrates nicely the problem of 
‘virtual action’ mentioned above. It is at first coun- 
ter-intuitive to consider that a person can be directly 
responsible for the operation of a machine physically 
separated from that persons by a space of miles. 
Nonetheless, it is argued that this 1s the more rea- 
soned approach and that persons such as Frena should 
be held to the legal standards for secondary infringe- 
ment. The Frena case was apparently settled by the 
parties before an appeal could be mounted, but other 
courts undoubtedly will be given the opportunity to 
examine this issue in due course. 


Trademark 

The issue of trademarks and the Internet arises in two 
main contexts. First, there is the question of what 
impact this new communications medium has upon 
existing licensing and marketing arrangements. Sec- 
ond, there is an ongoing set of concerns related to the 
use of Internet ‘domain names’. 

Trademarks are uniquely associated with the proc- 
ess of advertizing and distributing products and 
services. Rights to use marks in a certain way are 
often a key point in negotiating distribution and 
franchise agreements. Further, there is generally no 
international recognition of trademarks registered or 
used within a single country. Some very famous 


marks are owned by, or licensed to, different persons . 


in different territories and many long and bitter inter- 
national court cases have been fought establishing 
the rights to use certain trademarks in a certain 
manner. As a result many marks are used by different 
companies in different territories pursuant to care- 
fully negotiated and complex licensing arrangements. 

The Internet, however, represents a new type of 
distribution and advertizing medium which is not 
tied to any one physical location. A World Wide 
Web page using a registered mark and advertizing a 
certain product is accessible everywhere in the world. 
This can produce some very tense moments as peo- 
ple reach quickly for their franchise agreements to 
discover what recourse they have against the user, if 
any. In this case the individual wording of distribu- 
tion and franchise arrangements may result in no 
party being allowed to use a mark in an Internet 
advertising medium, or allow all distributors/ 
franchisees to use the mark in this way, or the result 
could be ambiguous. As such agreements come up 
for renewal this will undoubtedly be on the agenda 
for resolution. 

Domain names are a related, but different con- 
cern. As will no doubt be explained by others, each 
computer connected to the Internet has its own unique 
name. The root level of this name represents the 
Internet *domain' in which the machine resides. For 
technical reasons, no two machines can have exactly 
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to large organizations. The law in the UK was 
amended recently to make it more clear that pornog- 
raphy laws apply to activity which takes place via 
data networks such as the Internet, but this amend- 
ment is quite broad and makes it an offence to 
‘transmit’ pornographic image data. It is not clear 
whether this standard will apply only to the indi- 
vidual who presses the 'Send' key on his email 
software, or whether it might also apply to those 
whose equipment is involved in the ‘transmission’ of 
image data. In any case, employers should be careful 
to warn members of staff that trafficking in ‘compu- 
ter porn' is absolutely forbidden on company 
machines and local area networks. 


Other regulation 

In the short space of this paper, it 1s of course 
impossible to canvas the application of every law 
conceivably applicable to the operation and use of 
the Internet. The main thrust of this section has been 
to hit some of the more obvious and pressing con- 
cerns. Examples of other ‘Internet regulation’ would 
include telecommunications regulation, potentially 
*television-type' broadcasting regulation (as the term 
"broadcasting' is quite broadly defined in the Broad- 
casting Act 1990), data protection legislation related 
to the collection and distribution of personal data, the 
law of fraud and misrepresentation, financial serv- 
ices regulation, and consumer protection regaultion. 


Security: identifying the risks 

Internet security concerns can be divided broadly 
into three categories: intrusion, interception and au- 
thentication. Intrusion is the risk that some person 
connected to the Internet will attempt to ‘enter’ your 
own network via your Internet connection. Popularly 
(if inaccurately) known as ‘hacking’, this is the type 
of activity that prompted banner headlines through- 
out the 1980s and forced the reform of laws around 
the world as most could not cope with the concept of 
‘breaking and entering’ into a computer when no 
physical contact had taken place with the machine 
and no physical damage had resulted. Interception 
describes the risk that messages transmitted via the 
Internet may be copied by unauthorized third parties 
without knowledge of the sender or recipient. Fi- 
nally, authentication describes the problem of 
confirming the identity of a party who sends or 
receives messages. 

Quantifying the risk of intrusion is a source of 
great disagreement within the Internet industry. Serv- 
ice providers argue that many organizations face 
greater risks from physical intrusion or in-house 
fraud than from virtual intrusion via an Internet 
gateway. This argument is difficult to answer, but it 
must be understood that connecting a local area 
network to the Internet results in a qualitatively 
different type of risk than that presented by physical 
intrusion. For one thing, the potential group of ‘in- 
truders’ is increased to the many millions of people 
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judge sheds some interesting insight into how quickly 
fortunes can be won or lost on the Internet. Curry is 
claiming (among other things) that company 
executives ceded any rights they might have had to 
the use of the domain name to him in an early 
meeting concerning MTV’s involvement in the 
Internet. The judge has not yet ruled on this claim, 
but he refused to dismiss Curry’s contention out-of- 
hand noting that what may appear commercially 
irresponsible today may well have appeared inno- 
cent enough only a few months ago. The case is not 
yet resolved, but the judge clearly recognizes that 
estimates of the Internet’s value and future have 
shifted radically in only a few years. This early ruling 
points up the necessity of thinking carefully about 
how an organization will present itself in an online 
environment so as to minimize potential long-term 
damage to its image and strength of branding. 


Defamation . 

Application of defamation law to the Internet will 
present certain technical challenges to lawyers, but 
there is no doubt amongst practitioners that liability 
will attach in certain circumstances. Here in the UK, 
two recent data network defamation actions have 
recently been settled. In the first, a large UK grocery 
retailer settled a claim brought by a customer due to 
statements which were circulated within the compa- 
ny's internal email system. A single store manager 
allegedy questioned the credibility of this customer 
in email which was circulated to managers of other 
shops in the chain. The customer learned of this 
electronic accusation and filed suit. The terms of the 
settlement were not disclosed. 

In the second case, a UK academic had sought 
damages against a Swiss academic due to allegedly 
defamatory statements posted to the Usenet (a type 
of bulletin board system operating primarily via the 
Internet). Again, the terms of the settlement were not 
disclosed. This second case would have presented 
interesting issues related to international electronic 
publication since the Usenet is accessible in virtually 
every country in the world. On a strict legal analysis, 
it would appear that the Swiss academic had effec- 
tively ‘published’ his remarks in each of these 
countries and was potentially exposed to defamation 
liability in each one. International defamation risk 1s 
one which is well understood in the publishing in- 
dustry, but may not be fully appreciated in other 
industrial quarters. 


Pornography | 

One area were the application of law to the Internet 
has-been especially active is that of restrictions on 
pornography. There have already been a number of 
high-profile arrests both here and iri the US concern- 
ing electronic distribution of pornographic 1mages. 
Most members of this audience would not normally 
concern themselves with this area of regulation, but 
it should be stressed that it presents risks especially 
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decode the contents of a single message.) Along with 
certain technical constraints, there are a number of 
laws which regulate the use and export of encryption 
technology. A discussion of this issue is clearly 
beyond the scope of this paper, but it should be borne 
in mind that these factors create stumbling blocks in 
the path of encryption-based security solutions. 


Connecting to the Net — a legal checklist 

As outlined above, there are a number of basic l2gal 
issues which should be addressed before becoming 
connected to or making use of the Internet. In the first 
instance, review whatever form of service agreement 
is provided by your Internet service provider or Web 
hosting service. Examine closely the claims mace in 
this agreement and understand how the service com- 
pany relates to the Internet as a whole. Will "our 
network or home page ‘live’ at the end of a data 
superhighway, or will it struggle to convey messages 
via a data dirt path? Do you understand and 266 
with the charging mechanism used? If charges are 
based on traffic volume, has the service provider 
explained that volumes can increase exponent ally 
depending upon the type of traffic (e.g., text, graphic, 
computer file, video) sent via the network? Does the 
agreement make a reasonable apportionment o: the 
various risks outlined above? What service guaran- 
tees are made, if any? Do these extend to network 
interconnection arrangements, or apply solely to the 
service provider's network? 

If you are launching a World Wide Web home 
page, make best efforts to ensure that your page's 
content is copyright-cleared. In many cases, organi- 
zations are surprised to learn that they do not own 
unrestricted rights to marketing material which they 
produce or which is produced on their behal-. In 
addition, make a clear statement asserting your own 
rights in the material placed in the home page envi- 
ronment. One should specify the scope of the licence 
granted to those who ‘browse’ your pages, but be 
careful not to make the licence right too technoldgy- 
specific. Web browsing and cacheing technology is 
currently progressing in a number of different d rec- 
tions and it is probably impossible to identify with 
specificity exactly how many and what type of 'zop- 
ies’ will be acceptable for the end-user to make. 
Licence terms should provide linkage to types o7 use 
made of the copies which are produced in the prccess 
of browsing. 

Make certain that company employees arc in- 
structed concerning the danger of breacaing 
copyright, defamation, data protection, or porncgra- 
phy laws in simple email messages. À common 
example where copyright concerns can arise would 
be the employee who ‘cuts’ a page from an electronic 
news source, 'pastes' it into an email message, and 
forwards this to others. 

If your web page contains forms which recuest 
any type of personal data whatsoever, be certair that 
your data protection compliance officer has reviewed 
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with an Internet connection. Each is only a keystroke 
away from your local area network, and a certain 
group of these people have proven their malicious 
intent time and time again. Vendors often speak of 
‘firewall’ technology as the solution to intrusion 
risk, but it must be borne in mind that this only 
reduces risk and does not eliminate it. Similarly, of 
course, placing locks on office doors only serves to 
reduce risk and does not eliminate it. The main point 
to bear in mind is that connecting your organiza- 
tion’s network to the Internet is like cutting a new 
doorway into your office building. You need to be 
satisfied that the new entrance is adequately secured 
and guarded. 

Interception is also a difficult risk to quantify. 
Historically, Internet networks have been viewed as 
more prone to interception attack than other types of 
communication infrastructure. Many persons close 
to the industry believe that the Internet is less secure 
than a telephone call, or fax, and may be less secure 
than more traditional X.400 electronic mail networks. 
Nonetheless this view is changing as network service 
providers begin to incorporate advanced security 
equipment into their network architecture. 

Note that it is not a sufficient answer to the 
interception risk to claim that searching for messages 
is made more difficult by the volume of message 
traffic varied on the networks. Each Internet network 
is really a series of computers making copies of 
messages, and a common method of interception 
attack is to break in to one of these computers and 
install a ‘sniffer’ program. This program sits unob- 
trusively and does not interfere with the normal 
handling of messages, but it can be set to scan for 
specific examples of message traffic which are then 
copied and emailed to the intruder for further action. 
Typically these programs have been set to scan for 
the word ‘password’ in an effort to gain information 
for an intrusion attack. More recently, some of these 
programs are designed to look for strings of 16 digits 
set in groups of four within the body of an email 
message (i.e., a credit card number). Interception 
attacks thus take advantage of the very technology 
which makes the Internet work in order to locate 
‘needles in haystacks’. 

Authentication tends to be an issue of primary 
concern when conducting anonymous electronic com- 
merce and is really not very different from traditional 
risks which have been long identified by those in the 
financial services industry. 

One solution which is commonly proposed for 
the problems of interception and authentication (and 
to a lesser extent for intrusion) is the use ofencryption 
technology. Modern digital cryptographic techniques 
make it possible for two parties to an Internet ‘con- 
versation' to exchange data which can be easily 
intercepted, but is economically infeasible to read. 
(An example of ‘economically infeasible’ might in- 
clude a scenario where 100 computer workstations 
acting in unison over a 3 month period are required to 
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similar note, consider the role of hypertext links built 
into your page that point to third party sites. You will 
have no way to control what is at the end of this link, 
but have you given your readers sufficient notice of 
this fact? Aside from the possible legal liability which 
may arise from facilitating the transmission of the 
material at the end of the link, there is the important 
business issue of.distinguishing 'your' information 
from information provided by others. 

Finally if you are connecting your network or a 
series of individual computers to the Internet, be 
certain to consult with your risk management team to 
determine what measures and additional resources 
will be necessary to 'secure' the connection from 
intrusion. 


I wish to express my gratitude to Christopher Millard of 
Clifford Chance for his continual support, guidance, 
and good humour over the last three years and to 


acknowledge the central role he has played in shaping 
the ideas presented within this paper. Errors and omis- 
sions remain the sole responsibility of the author. 
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the process used to acquire the information and has 
had the opportunity to suggest the inclusion of cer- 
tain notices. Similarly, data protection officers should 
review the status of their data protection registrations 
to ensure that these are up to date and are sufficiently 
broad to encompass the proposed activity. 

If you plan to advertise your products or services 
on a Web site and the advertising involves use of 
brand names or trade marks, be careful to review 
licensing and franchise agreements to ensure that 
you are not violating rights held by others. Submit 
your World Wide Web advertising copy to the same 
review process that is currently in place to review 
material produced for other advertising media. If you 
operate in an industry whose advertisements are 
heavily regulated (e.g., financial services), you may 
wish to seek specialist advice on how these regula- 
tions will apply. If you are launching a web page, 
retain ‘ownership’ ofthe page's address (URL) to the 
extent that this 1s technically practicable. 

Consider the possible utility of, or regulatory 
necessity for, various disclaimer statement announce- 
ments on a Web page or within email messages. On a 
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service provider! Historically, buying things ‘online’ 
has been the preserve of the mail order companies — 
and very successful they have been too, selling a 
huge range of products over the telephone. As well as 
the more obvious catalogue companies such as 
Littlewoods and Kays, there are hundreds of others 
whose leaflets cascade from the colour supplements 
every weekend — like Innovations and Racing Green 
— or fall through our letterboxes with depressing 
regularity. 

Typically such purchases are quite high in value 
— £15 for a book, £30 for some software, £20 to send 
some flowers. And when you think about it, we do 
not buy very much this way, so there are relatively 
few transactions. 

But there is also a market that has not yet been 
tapped — the market for information. Paying ten 
pence for the latest weather map, five pence for an 
up-to-date valuation of your share portfolio or a 
penny for the latest traffic update could become quite 
common — if only there was a cost effective way of 
carrying out such transactions. They are very low 
value, and potentially there could be huge volumes 
of them. Today's equivalent of this market is the 
0891 phone network, where half of the cost of the 
phone call we make is passed to the service provider. 


3. Service provider requirement 
The service provider would ideally like to accept any 
payment mechanism that suits the consumer. But 
probably the most important issue for him is guaran- 
teed payment, and ideally he would like to receive 
immediate value too. It goes without saying that the 
whole payment system should be fraud proof, 
auditable, easy to use, and of course he does not want 
to pay anything to a bank (who does?). 

In today's online environment, there is therefore 
a clear requirement for a payment mechanism that 
has the convenience of cash — immediate, guaranteed 
payment — and can be sent easily and cheaply in 
electronic form. 


4. Existing payment mechanisms 

Of the existing payment mechanisms, the most obvi- 
ous ones to use online are credit cards and debit 
cards. What could be easier than simply quoting the 
card number — just as you do on the phone? But there 
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1. Introduction 

I have been asked to talk about payments on the 
Internet but, as you will see, I plan to widen the 
scope. The subject 15 so big that there are entire days 
being devoted to it at conferences already. Indeed, it 
is only a matter of time before there is a whole 
conference... 

My aim today is to give you a high level over- 
view, and I will start by looking at the subject from 
the point of view of the consumer — the man/woman 
in the street. I will then look from the information or 
service provider’s viewpoint, and consider the ap- 
propriateness of both old and new payment 
mechanisms. 


2. The consumer requirement 

The consumer has access to the Internet, to 
CompuServe, Europe Online and several other net- 
works — now including Microsoft Network. As 
companies like Videotron and Cambridge Cable roll 
out new systems, the consumer will also have access 
to interactive services via cable. 

British Telecom are working on a trial of a video- 
on-demand service that, if it is rolled out, will provide 
yet another source of interactive services. There is 
also a number of trials of interactive terminals in 
shops and banks. One ofthem is being run by NatWest 
and Thomas Cook, with ‘kiosks’ in three of their 
branches and three of ours. 

And the future holds more. Very soon you will be 
able to buy PDAs and laptop computers with a built 
in GSM module, enabling you to log on to your 
office LAN from virtually anywhere. British Telecom 
recently demonstrated a prototype of such a machine 
that you strap to your arm. And if Iridium takes off, 
your wristwatch computer may even communicate 
by satellite! 

This 1s where I have widened the scope of my 
presentation. From the consumer's point of view, the 
fact that there are many different networks using 
several different technologies is quite irrelevant. To 
the consumer they are all simply ways of getting 
information; ways of purchasing goods and services; 
ways of communicating. They are all ways of going 
‘online’. 

When online, the consumer will be expected to 
buy things — otherwise there is no profit in being a 


Aslib Proceedings, vol.47, no.11/12, November/December 1995. pp.241-243 


Internet payments — the issues 





1994 by schemes in Austria, Finland, Portugal and 
Spain. In 1995 we will see pilots in Belgium and the 
Netherlands, as well as CAFE and Mondex. 

Mondex went live in Swindon on 3 July. It is a 
collaboration between NatWest Bank, Midland Bank 
and British Telecom. As I am therefore rather biased 
in its favour, 1 should like to spend a few moments 
talking about it. Like most other schemes, Mondex 
relies on smart card technology — but there the simi- 
larity ends. The other schemes insist that all payments 
are 'cleared' back through a central point, which 
controls the system and logs all transactions. In this 
respect, the schemes are similar in operation to credit 
and debit cards. 

Mondex is quite different in that it follows the 
cash model. If I give you a pound coin, guaranteed 
value passes immediately. If I pay you a pound via 
Mondex, guaranteed value passes immediately. There 
is no requirement to clear a cash transaction, or a 
Mondex transaction — or to log them. That is why 
cash and Mondex are so quick and convenient — and 
cheap to operate. 

Cheap? Did I say cash was cheap? For the aver- 
age consumer, cash undeniably is cheap. The only 
real cost 1s the repair of holes in trouser pockets. But 
for the retailers and banks cash 15 very expensive. 1 
those safes and alarms for keeping it. All those 
secure areas for counting it. The insurance policies, 
the security vans, etc. etc. It has been calculated that 
handling cash costs United Kingdom plc around £4.5 
billion every year. 

So cash is not so cheap after all! Which is why 
retailers in Swindon are so enthusiastic about Mondex. 
And it is not just the big retail chains. The corner 
newsagents are, if anything, even more enthusiastic 
than the big stores. Corner newsagents cannot afford 
security vans to collect cash every day. So the propri- 
etor risks injury or death every night when he carries 
his takings to the nearest night safe. How much 
easier it would be to pay Mondex cash into his 
account over the phone. 

That is where Mondex is unique. It is the only 
new payment mechanism that can compete with cash 
in the high street, and it is the only online payment 
mechanism that is cost effective for transactions as 
small as a penny. 

And it is very easy to use online. All you need is 
a Mondex card and a Smart Mouse — a simple smart 
card reader that plugs into the serial port of any PC. 
That 1s all service providers need too. Except that 
they will probably need more than one, to cope with 
several concurrent transactions. 


6. Security 

I have already touched on security. How do we cut 
fraud? How can we ensure there actually is a valid 
card and cardholder at the other end of an online 
transaction? But there is another aspect too. How can 
the consumer be sure that he or she is online to a bona 
fide service provider, who will actually deliver the 
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are problems. Card fraud cost the UK banks £100 
million last year. We have had some success in 
reducing this figure — it is down from £130 million in 
1993 — but fraud is carried out on an international 
scale and, as soon as we have plugged one loophole, 
the fraudsters find another. 

One of the ways we fight fraud is by making the 
card difficult to forge, and by making sure there is a 
card. That is quite difficult to do in the online envi- 
ronment. How can we prove over the Internet that 
there really is a valid card associated with every 
transaction? 

Under the current international card rules, unless 
the retailer has physically checked the card and 
signature, payment is not guaranteed. In our jargon 
we call this a ‘cardholder not present’ transaction, 
and we are already seeing big increases in cardholder 
not present fraud — which the retailers pay for, not the 
banks. Which doesn't make it any more acceptable... 

From the banker's and the service provider's 
point of view, credit and debit card transactions are 
not economical for low value transactions. A 10 
penny Visa transaction would cost more to process 
than its face value... 

An alternative way to collect low value transac- 
tions would be to add them to the network provider's 
bill. Videotron could, for example, add all those 10 
pences to your monthly bill for the cable service. 
Until a couple of weeks ago CompuServe used to 
charge for their extended services this way. 

This solution is quite attractive to the service 
provider, although payment would not be immedi- 
ate, but there is quite a high overhead for the network 
provider — accounting to all the service providers, 
sorting out queries and disputes etc. So it's not a 
cheap solution, but it is practical. 


5. New payment mechanisms 

There are currently several groups trying to find 
solutions for the online environment. You may have 
heard of Digicash, which can best be visualized as 
lots ofelectronic coins or banknotes. Almost uniquely, 
Digicash has no visible presence whatsoever. 
There is no card, no metal disk and no grubby piece 
of paper. It exists only as electrons — which we 
cannot see. 

Another new solution is the CAFE (Conditional 
Access For Europe) card, which has been funded by 
EC cash, under the ESPRIT project. It is a smart card 
based electronic purse scheme running in two EU 
buildings in Brussels. It can be used by bureaucrats 
to buy books from the bookshop, coffee at the vend- 
ing machines and lunch in the canteen. NatWest has 
had a similar scheme running in the canteen in our 
computer centre in London. The scheme was de- 
signed as the testbed for Mondex, and has already 
handled over a million transactions. 

There aré many similar smart card based elec- 
tronic purse schemes in Europe. Danmont, live since 
1992 and the longest established, was joined during 
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be now that the bid to purchase Intuit has fallen 
through? 

But it is just possible that the network and soft- 
ware suppliers do not want such deep involvement. 
For all their sins, banks do provide a valuable service 
as guarantors, underwriters and insurers of trade — 
and have been doing so for centuries. As you will be 
aware from the ups and downs of bank profits over 
the last few years, these services are very risky. Will 
the network and software providers be happy to carry 
these risks? 


8. Conclusion 

As you can see, the subject of online payments raises 
some huge issues. The subject is so large that most 
organizations simply do not have sufficient market- 
place clout to have any real impact on the likely 
outcome. The vast majority of organizations is sit- 
ting watching a handful of players as they jostle for 
position. The organizations wish those players would 
hurry up and put the payments infrastructure in place, 
so they can get on with the job of selling their 
products and services online. 

The utter confusion and the multiplicity of stand- 
ards remind me ofthe early days ofthe PC marketplace 
—long before IBM set the standard that most of us are 
still using today. Who would have thought that Com- 
modore, whose Pet computer seemed to be 
everywhere, would be bankrupted by IBM - who at 
that time only made huge mainframe computers? 
And who would have thought that IBM would then 
be brought to their knees by the likes of Compaq? 
And all this in no more than half a generation! 

The whole marketplace is again in a state of flux. 
Not one ofus can predict with any degree of accuracy 
what things will look like in just two years' time — let 
alone five years'. The only thing we can say with 
certainty 1s that we haven't a clue who or what will 
eventually become the de facto standard. 

NatWest is a leading player, with a major interest 
in Mondex and a fairly loud voice in Mastercard, 
Visa and the standards bodies. Naturally we would 
like Mondex to become the standard, but it is the 
marketplace that will decide whether or not we will 
be successful... 

Watch this space. We are in for a real roller 
coaster ride. Just like being at the funfair, we should 
smile and enjoy the trip — and hope that it is someone 
else who gets sick... 
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goods? Is there a need for some form of central 
guarantor, who warrants that each transaction is valid? 
If so, how can this be implemented internationally — 
and how can it be funded? 

This issue is already being addressed by the 
credit card schemes. Europay, Mastercard and Visa 
have issued draft specifications for replacing mag- 
netic stripe cards with smart cards. NatWest expects 
to be one ofthe first banks in the world to issue these 
smart cards in two or three years' time. 

In addition, Visa is working with Microsoft and 
Mastercard is working with Netscape to provide 
more secure methods to pass payment instructions 
across the Internet. 

Let us hope that Mondex, Mastercard and Visa 
eventually agree on compatible solutions. No high 
street retailer wants to have more than one card 
terminal on their counter, and no provider of online 
services wants to have to support a multiplicity of 
security mechanisms. Just think of the overhead! 


7. Payment service providers 

I said earlier that service providers would ideally like 
to accept any payment mecbanism that suits the 
consumer. Finding a practical way to accept several 
different payment systems is already a problem for 
high street retailers. Many financial institutions offer 
to help resolve this problem by taking all card-based 
payments from the retailer and sorting them out — for 
a fee, naturally. When you start thinking about the 
issues, you can see that there will be a demand for 
similar services to handle payments online. 

But there is no requirement that such services be 
provided solely by banks. In the online environment 
there are two players who already see everything — 
the network provider and the software supplier. Net- 
work providers are already masters of billing systems 
— BT sends out millions of phone bills every month. 
They could offer to collect all the low value, high 
volume transactions. They could set up — or buy — a 
company to provide payment services online. 

Software suppliers may not have the billing expe- 
rience, but they do have the ability to set up or buy a 
payment services company. Microsoft is already 
working with Visa, and is on record as wanting to 
charge a small fee for any transactions running across 
the Microsoft Network. They have the expertise and 
financial resources to compete with the banks — if 
Bill Gates wants to. I wonder who his next target will 
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made to provide information that will enable the 
same material to be retrieved by a different reseercher 
on the basis of the information provided by the 
author’s bibliography. This can only be achieved 
with a more thorough incorporation of addi-ional 
details into the body of the bibliographic 2ntry. 

Among the considerations for later retrieval ofa file 

are its physical and symbolic location (these two 

aspects are roughly equivalent to the city and pub- 
lisher entries in conventional references), the file 
name (the title) and the file format. 

Currently the conventions established by the 
World Wide Web (WWW) to access a variety of file 
formats globally provide the most concise ard in- 
formative system of incorporating Internet->ased 
resources into referenced works. This system is 40 
upon the Uniform Resource Locator (URL). In g-aphi- 
cal WWW browsers activating the option Show 
Locations reveals the URL of each document being 
accessed. The generic format of a URL is: 

file format://computer.type-of-system.country- 

code/file-directory/file-name. 

Thus 
http://www.gu.edu.au/gwis/cinemedia. Cine 
Media.home.html 

refers to 
e a hypertext document (http) 

e based at the Griffith University WWW server 

(www.gu) 

e which is an educational institution (edu) 

e physically located in Australia (au). 

e The file itself is nested within two directories 

(/gwis/cinemedia) 

e and is identified by the name CineMedia. 

home.html 

e Theend HTML also indicates a hypertext docu- 

ment format. 

The file name is case sensitive thus cinemedia. HOME. 

HTML does not point to the same file. 

Researchers using this system for their comouter- 
based material must be aware of this in referencing 
files in this manner. 


The Hyper-Text Mark-up Language (HTML) file 
format 

The World Wide Web's Hyper-Text Mark-up Lan- 
guage (HTML) is becoming the file format most 
commonly used for online academic journals. This is 
a result of the rapid growth of the World Wide Web 


Overview 

Despite the rapid growth of the Internet during 1994 
and 1995 no adequate or consistent method of refer- 
encing material from this source has been developed. 
Failure to address this issue will result in Internet 
resources not being awarded full recognition within 
academic discourse. Unless corrected, the signifi- 
cance of this oversight will be exacerbated as more 
academic journals become available online and more 
computer literate students enter tertiary study. Fur- 
thermore, the status ofresearchers who have published 
in this medium will be affected and universities may 
deprive themselves ofthe staff best equipped to meet 
the challenges of the electronic age. 

The methods used to reference material gained 
from the Internet should echo existing referencing 
styles. This consistency will improve the readability 
of references to Internet-based resources and will not 
distinguish the material solely because of its contem- 
porary distribution method. This paper proposes the 
development of a consistent bibliographical refer- 
encing method which emerges from available 
information in Internet-based file formats including 
the Hyper-Text Mark-up Language (HTML). If 
adopted, it will avoid the necessity for inclusion of 
the computer file label which has become a de facto 
and inadequate solution to a complex problem. 

Although there are a large variety of referencing 
systems available the solution being proposed is 
consistent with the Guide to Referencing(Dow, 1995) 
and the Australian Government Publishing Service's 
Style Manual (1994). The work of Li and Crane 
(1993) currently represents the only consistent system 
for referencing electronic resources. Its publication 
prior to.the wide adoption of HTML inhibited the 
development of a concise approach to Internet 
resources which might otherwise have been possible. 


The problem 

Distinguishing material as a computer file has a 
limited utility in its acknowledgement of the need for 
additional tools to display the material. This generic 
label, however, ignores the variety of different com- 
puter platforms and file formats that currently exist. 
Few of these formats are interchangeable with other 
formats or the specific readers that are used to decode 
the files..Distinguishing material as a computer file 
. is not a sufficiently informative pointer for referenc- 
ing computer-based material. An effort should be 
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tution most acknowledged for its assistance in the 
HTML document. 

The educational (.edu) site is, however, more 
likely to be the source of the HTML file. The site 
referred to by the URL: 

http://www.gu.edu.au/gwis/hub/hub.home.html 

contains institutional information. This is however 
rather cryptic. There are Internet resources available 
which allow this site name to be searched for with a 
real world name being the returned result. This form 
of retrieval is dependent on the site being registered 
with one of these indexes, This additional research 
requires detailed knowledge of the Internet’s avail- 
able resources — a condition which should, perhaps, 
preface any situation in which Internet resources will 
be referenced. 

In attempting to make references to Internet 
material retrievable at a later date the difference 
between citing hub.home.htm] and Humanities HUB 
(the title of the above example) is significant. The 
full URL allows a recognition if it was correctly 
noted in the bibliography. The title of the resource 
can be used in a network search using one of 
the better engines, such as WWW Worm, Yahoo or 
Webcrawler. Attempting this form of retrieval 
assumes that the site is registered with one of these 
indexes. À potential advantage in the using the docu- 
ment's title rather than its URL location is its 
accessibility after a file is moved. Although most 
sites place a pointer to a file's new location when it is 
moved these are usually only maintained for a short 
period. The moved file, however, could be retrieved 
by a network search using the title as the keyword(s). 
The problem of providing ongoing retrievability can 
be hedged with the inclusion of both sets of details in 
a reference. 

The alteration of files by the author could be 
likened to the creation ofa new edition at the expense 
of an earlier one. This situation would hopefully be 
relatively rare for academically orientated material. 
However referencing online daily newspapers can- 
not avoid this loss of referred material. In these 
situations the researchers would be advised to main- 
tain a personal archive of Internet material. Students 
submitting coursework who choose to reference from 
Internet should maintain an archive of their referred 
material. 

Internet resources should be used judiciously. 
Referencing to an article online with its additional 
complexities should not be attempted when there is 
the possibility that a printed version can be obtained. 
This aids in reducing the complexity of biblio- 
graphies that result from the referencing of 
non-traditional mediums. Encouraging the referenc- 
ing ofpaper versions does not devalue the importance 
of their electronic versions. Online journals can be 
considered as working papers which allow the re- 
searcher to identify articles of interest and relevance 
rapidly. The scarcer paper copy could then be obtained 
and used as a referencing copy, thus allowing the 
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and the ease-of-use of its graphical user interface. 
The basis of the WWW is the Uniform Resource 
Locator. The uniformity of this file identifier 
commends it as the basis for the bibliographic refer- 
encing of the WWW documents. Àn advantageous 
consequence of complex machine names and URLs 
is that documents written in the HTML format also 
contain a simplified 'real world' title field. This 
information appears in the title bar of the WWW 
browser's viewing window when the site is accessed. 
All HTML documents can be assumed to have a file 
format, a title and a URL by virtue of their existence 
as WWW documents. The ‘voluntary’ or added as- 
pects of the document may include the author's 
name, an institution (in place of publisher details) 
and a form of date. 

Documents with authors present no problem!. 
Documents with no personal author will often have 
an institutional body referred to within the document 
itself. This may take the form of a link to another 
WWW site or a direct reference to the institution. 
Sites often contain a hypertext link at the base of the 
file which allows the reader to email the site’s author. 
Using the email address of the author in place of 
actual names provides a unique identifier which can 
be reused meaningfully. Failing these possibilities — 
and this situation would be relatively rare — the 
HTML file contains information which 1s not neces- 
sarily displayed directly by the WWW browser. Most 
browsers allow the user to read the document’s source 
code. The source code is usually available through 
the a series of command such as View + Source. As 
HTML is straight text with readable attached layout 
tags it is possible to read the information in the 
<HEAD> section of the document to obtain an au- 
thor’s name or institution. Unfortunately the <HEAD> 
tag in HTML is now optional, reducing the value of 
this last option for the future. 

Consideration must be given to the fact that there 
are at least two types of HTML documents on the 
Internet. The most common type is the collections. 
These sites have no content in the sense of readable 
or research material, as they are simply a collection 
of other sites compiled by the document’s author. 
The value of the better collections is their systematic 
grouping of sites for perusal. These sites are the most 
likely to be anonymous but the least likely to be 
referenced academically. The less common content- 
provider sites contain electronic journals or online 
data. As a result there is a high probability that an 
author and/or an institution will be referred to in the 
text. 

Identifying the publisher of HTML material may 
become increasingly difficult as a result of the com- 
mercialization of the Internet. This process has 
allowed research groups and units to maintain a 
distinct server often with a .com (commercial) suffix 
while still maintaining an association with their origi- 
nal institutions. The first preference in referencing a 
publisher would be to include the name of the insti- 
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The institution/publisher acknowledgment is re- 
placed by an e-journal affiliation. For those e-journals 
that maintain conventional volume numbering, this 
information could be included after the journal's 
name. The URL should still be required regardless of 
the amount of other bibliographic details available. 
This insistence upon a URL provides a recognizable 
label that replaces the computer file tag and provides 
a consistent format for researchers to use. 

Undated sites could be historicized with the less 
than mark (or equal to or less than marks) with the 
year the file was accessed as the given date i.e. «1995 
or better still > 1995 or z1995. However the second 
two options require a fairly sophisticated word proc- 
essor or handwritten essays. These symbols. are 
preferable to the n.d. (no date) or c. (circa) tags as 
they recognize that the referenced material is change- 
able. This, however, is a short term solution. Authors 
of academic documents on the Internet will become 
more aware of the need for researchers to reference 
their material adequately and should begin providing 
full bibliographic data as part the document's header. 
The format of referencing applied to monographs 
allows these welcome additions to be included 
without the need for a new approach to referencing 
HTML documents. Similarly, extensions to the basic 
referencing details of a monograph such as different 
editions or different publisher (WWW locations) can 
be readily accommodated. 

Page numbers and pages themselves are non- 
existent in HTML files. Documents contained within 
a single file are often referred in Internet jargon as a 
page. Longer documents are generally broken down 
by their chapters into a series of individual HTML 
files (pages). This current and established Internet 
practice allows some level of pointers to be devel- 
oped with the use ofa monograph derived referencing 
system. As an example: 


Collins, Jane z1995, My life as an AOLer — 
Introduction, Institute for Internet Studies, http:/ 
/www.bob.com/~jcollins/aoler1.html 


Collins, Jane =1995a, My life as an AOLer - 
Hackerhood, Institute for Internet Studies, http:/ 
/www.bob.com/~jcollins/aoler2.html 


In text this could be referenced as (Collins 1995) & 
(Collins 19952). 

Although this is messy it replaces page numbers 
with references to multiple same year publications. 
No other immediate solution appears to exist. A more 
sophisticated approach, which requires a greater aware- 
ness of how HTML files are constructed, would be to 
use the filename in place of the page number. For 
example: 

(Collins 1995: aolerl.html) & (Collins 1995: 

aoler2.html) could refer to the same bibliographic 

references as above. 

There are additional sophistications which could 
be developed witha greater awareness ofhow HTML 
files are coded. The <a name> tags in HTML 
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researcher to provide full referencing and page num- 
bering details. However, this cumbersome method of 
referencing will become less viable with strained 
library resources and an increasing number of jour- 
nals becoming solely available online. 

The best methods for referencing HTML files can 
be derived from the referencing styles used for mono- 
graphs. The author remains as the basis of the 
bibliography's order. The year ofthe original uploading 
could be included if it is acknowledged in the docu- 
ment itself. The title of the reference equates with the 
title that appears at the top of the document’s window. 
The publisher details would be the name of institution 
where the file was maintained if it 1s ascertainable 
from the document. The place of publication would be 
entirely replaced by the document's URL. Providing 
the full URL provides a level of redundancy in the 


* „entry which allows the cross-checking of provided 


references. This method also enables references to be 
made to those sites with no apparent publisher. Using 
an article from the C-Theory site as an example the 
complete bibliography entry would be: 

Brenner, Anita 1995, The Murder Trial: Genre 

or Event-Scene?, C-Theory, http://english- 

server.hss.cmu.edu/ctheory/e-murder. trial.html 
and would be associated with in-line references that 
appear as (Brenner, 1995). This example is unusual as it 
relates to an e-journal which does not currently main- 
tain volume or numbering details. If C- Theory changes 
this policy the new reference would italicise the title of 
the e-journal and include volume/number details. 
A minimal bibliographic entry would appear as 

The Murder Trial: Genre or Event-Scene? (1995 

http://english-server.hss.cmu.edu/ctheory 

/e-murder  trial.htm] 
and the corresponding in-line references would be 
(The Murder Trial: Genre or Event-Scene? ~1995) 
which still provides some information for other 
researchers. This type of reference would occur when 
only the title and the URL were available to the 
researcher. 

Asa slight deviation from conventional referenc- 
ing the final full stop in the bibliographic entry 
should be omitted to avoid URL addressing confu- 
sion. To avoid additional confusion the URL should 
only be split with a space or soft return after the 
forward slashes and not between words ofthe location. 

This system of referencing does not recognize 
those documents with an e-journal affiliation. As 
files can be reflected (duplicated) at different sites 
with different URLs recognizing the institution as 
the publisher may not always be useful. However 
where periodical-style information is evident or 
available through hypertext links it should be 
acknowledged conventionally. As an example: 

Foucault, Michel 1995, ‘Madness, the Absence 

of Work’, excerpts, tr. P. Stasny & D. Stengel, 

Critical Inquiry, vol.21, no.2, Winter, http:// 


www.uchicago.edu:80/u.scholarly/CritIng/ 


v21n2.foucault.htmi 
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by the printing command of WWW browsers (but 
should become an available option) Among the 
proposed features of HTML+, the next version of the 
HTML language, is the tag <PRINTOUT>, which, if 
it were supported by WWW browsers, would pro- 
vide exactly this solution. 


Gopher 

The referencing scheme outlined for HTML files is 
equally applicable to the other Internet resources 
which can be accessed via a graphical WWW browser. 
Gopher services, as the text based predecessors to the 
WWW, represent substantial investments of time 
which are not readily transferred to WWW-based 
HTML files. This is not, however, an obstacle to 
accessing the large number of resources available 
through GopherSpace. Many universities still main- 
tain gopher servers which use software other than 
WWW browsers. This software ‘hides’ the server, 
directory and filename information from the user. It 
can be retrieved but may require a level of skill 
beyond that necessary for day-to-day use of the 
Internet. Researchers and students are strongly ad- 
vised to use a consistent interface to the Internet for 
both ease of use and regular referencing methods. 

The URL used to access gopher servers via a 
WWW browser is similar to a hypertext URL but is 
prefaced by the gopher:// tag. For example, the Marx 
and Engels archive can be accessed through this URL: 

gopher://csf.colorado.edu/1 1/psn/Marx 

These URLs are usually less self-explanatory 
and longer than those used by the WWW but remain 
as sensitive to misspelling, upper and lower case 
conflicts and misplaced punctuation. However the 
format used to reference HTML files is equally ap- 
plicable here. Essentially the gopher URL should be 
used instead of the place of publication with the 
remainder of the bibliographic entry treated as a 
reference to a monograph with as much detail being 
provided as possible. 

The relative age of the GopherSpace does, how- 
ever, present problems in accessing full bibliographic 
data. Gopher sites operate at a more institutional 
level than the WWW. While WWW pages have 
readily identifiable individual authors within the 
overarching framework of the institutional server, 
the gopher site and the provider institution have a 
more closely integrated relationship. The WWW 


‘could be said to encourage page authors where 


GopherSpace harbours anonymous programmers. 
A solution could be to ascribe authorship of 


` apparently anonymous gopher sites to the smallest 


identifiable institutional unit which will often be a 
computer science department. Thus a reference for a 
gopher site may appear as: 
Library Services c.1995 Internet User Glossary, 
North Carolina State University, gopher:// 
dewey.lib.ncsu.edu:70/7waissrc963 A/.wais/ 
Internet-user-glossary 
The utility of this author ascription is debatable. 
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subdivide a single document into many smaller 
pieces. The purpose of these tags is to allow a user to 
click on a word or symbol at the top of the document 
and be immediately moved to where the «a name> 
tag is situated. A reference to a sentence preceded by 
the «a name=3D#modem> tag in the introduction 
document ofthe reference above could be referenced 
in this manner: 

(Collins 1995: modem) or (Collins 1995: 

aoler].html#modem) ۱ 
This level of detail approximates individual refer- 
ences to pages in a conventional book. The major 
problem in this method is the complexity in ascer- 
taining if these codes exist within the document. Not 
all HTML documents use the «a name> tag. 

A solution to the lack of page numbers for Internet- 
based resources may be to establish a style sheet for 
printing of HTML documents. A preset top, bottom, 
left and right margin coupled with a defined font and 
point size presented as a publicly available style 
guide would allow researchers to set their WWW 
browser's printer set-up to these parameters with the 
understanding that the source code has not altered in 
any way. The resultant pages could be referenced as 
(Collins 1995:[3]) with the square brackets acknowl- 
edging the variable nature of the numbering. Other 
viable solutions which could equally be adopted 
include numbering paragraphs, counting fixed num- 
bers of lines, for example 25, as single page or 
having authors include a number as an integral part 
of the document between each ‘page’. 

Another problem in utilizing the Internet for aca- 
demic referencing is that the researcher must be 
aware of his or her location within the Internet. There 
is a growing tendency for WWW browsers to have a 
default configuration in which the Show Location... 
command is turned off. While it is just a matter of 
clicking it on, a degree of awareness and training is 
necessary. 

There are a number of requirements for consist- 
ent referencing of HTML files. The adherence to 
existing conventions is important. The use of a style 
guide for printed HTML files provides a means for 
referencing larger documents. Authors of HTML 
files should be encouraged to include bibliographic 
material within their files, with a minimum request 
that this material is contained within comment tags 
or the <HEAD>. This information could be coded as 
an actual bibliographic reference for ease of use and 
access. As a non-displayed comment the additions 
would simply be: 

<!— Greenhill, Anita & Fletcher, Gordon 1995, 

Humanities HUB, Faculty of Humanities, Griffith 

University, http://www.gu.edu.au/gwis cinemedia/ 

CineMedia. HUBH.html —!> 

۸ similar line could be included in the displayed 
document. This would allow the printed document to 
be easily reassessed online. The inclusion of a docu- 
ment's URL on the printed hardcopy of a HTML file 
is not something that is currently done automatically 
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Usenet News 

Usenet News can be accessed in a number of differ- 
ent ways. There, currently, seems to be no 1 
preference in software readers. Some graphical 
WWW browsers can access news. The hierarcaical 
nature of the news systems and the distributed nature 
of the material prohibits a conventional URL system 
from being used. News is physically held on zach 
subscribed server. This results in totally meaninzless 
URLs for later reference. A URL which points b the 
Griffith University server 15 not usable by someone 
accessing news from another university. . 

This different distribution method lends tself 
towards a more periodical-orientated style of refer- 
encing. There is usually some form of autho-ship 
acknowledged although newsgroups do not exclude 
the possibility of pseudonyms being used. 

News items usually have a header which ap- 
proximates a title. The newsgroup, itself, takes the 
role of the journal. The full date of the orizinal 
posting is available and can be used in the same 
manner as volume and number are used in cor-ven- 
tional journals. This provides a potentially useful 
reference. As an example: 

Graham, Adrian 1995, ‘Fishing in Mauri-ius’, 

alt fishing, 29th July. 

Unless research was being conducted specifi- 
cally on computer-mediated communicatior the 
stranger postings with the unusual names and titles 
would simply not appear as references — such és the 
generic: 

AOhell 1995, ‘Hi 211... و‎ alt.slack, 20th Feb. 

The major problem with references to the Usenet 
is the temporary nature of the postings. Not all 
newsgroup postings are archived and the references 
of today become ether tomorrow. Although a number 
of the major newsgroups are archived, finding them 
and specific references may be much more effort 
than the final piece of information was really worth. 
These considerations are really only a major comcern 
to a researcher going over a pre-existing biblingra- 
phy. Accessing this material may be more casily 
undertaken by contacting the author (via emai ?) of 
the article/research, assuming that he or she stil has 
a copy of the material. ۱ 


Listservers 

Listservers are the closest the Internet has to a hand 
delivered journal. The listserver of a specific journal 
posts to each subscriber's email address a full copy 
of the journal every time a new issue is comp eted. 
Fortunately, the header of an email contains most of 
the referencing information needed for constrvcting 
a journal-like bibliographic entry. The order and 
type of information will vary from listserver to 
listserver but generally the author of the specific 
article will be acknowledged as a proper name or 
sometimes as an email address. The title and journal 
name are covered by the subject and originato- sec- 
tions of the header. The year will always be the same 
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However, with the decline in gopher services, there 
should be little need to provide references for these 
materials. The majority of new users on the Internet 
prefer the more graphical WWW user interface. This 
filtering effect ensures that some degree of experi- 
ence and skill is developed in referencing and 
accessing Internet resources before the student/re- 
searcher finds it necessary to reference gopher sites. 
Time may also reduce the need for gopher references 
as the information contained on gopher servers is 
transferred to WWW sites. 

The example of the World Factbook also raises 
questions in relation to the distortion between ac- 
tual academic authorship and the digitizing and 
preparation of material for electronic distribution. 
Although there are skills involved in both proc- 
esses academic works require the acknowledgement 
of the academic author of the work. Acknowledge- 
ment of an individual responsible for the digitizing 
could be included after the title of the work. This 
echoes the style for acknowledging editors and 
translators where the original author remains of 
paramount importance. There would appear, how- 
ever, to be little utility in acknowledging an 
institutional body in this role as it is recognized as 
the electronic publisher in the reference and 
usually implied as such in the URL given for the 
document. 


File Transfer Protocol (FTP) 
File Transfer Protocol (FTP) is used to download 
software or text from a remote site to the user. If the 
user is accessing FTP through a WWW browser the 
text is displayed ‘raw’ with little or no formatting. 
These files can normally be attributed to individual 
people with all the appropriate referencing details. 
FTP is the earliest type of Internet publishing and, 
when it was (and occasionally still is) used, the 
material was a digitised version of conventionally 
published material. If the material cannot be accessed 
in the printed edition the URL is, once again, recog- 
nizable: 
ftp://nysernet.org/pub/resources/guides/big 
dummy .txt 
as 1s the resultant reference: 

Gaffin, Adam 1994, EFF's Guide to the Internet 

v2.3, Electronic Frontier Foundation, ftp:// 

nysernet.org/pub/resources/guides/bigdummy. txt 

Those people who access FTP via software other 
- thana WWW browser can easily convert their refer- 
ence to a standard URL by adding the ftp:// to the 
front ofthe server, directory and filename details that 
are needed to access the material. 

FTP documents again reinforce the need for a 
print-out style sheet which specifies a series of 
standard margins, fonts and point sizes while 
acknowledging that no formatting changes are con- 
ducted on the document. This would allow a square 
bracketed page number to give a general page guide 
for in-line referencing. 
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place in academic publishing résumés. Drawing 
documents on the Internet more closely in line 
(from a referencing point of view) with conven- 
tional material may assist in bringing about a change 
in policy. This more progressive position will be 
assisted by an increasingly computer-literate stu- 
dent body desirous of incorporating Internet material 
into their work. Encouraging this, through the pro- 
vision of referencing guidelines and tolerance 
towards the use of these documents, will provide a 
richer base of material to draw upon than is some- 
times available through conventional library 
resources. When funding for items such as paper 
periodicals is reduced, the ability to access and 
reference a hypertext version of journals and other 
materials, becomes an important commodity. This 
academic advantage 1s increased by the nature of 
the medium which provides infinite copies in con- 
trast to a single paper copy in a library. Refusal to 
recognize electronic publishing has the potential to 
lower the public profile of a university and impede 
its staff development. 
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NOTES 
1. Fortunately, as with most publishing and public 
expression the anonymous HTML author is rela- 
tively rare. 
2. The tilde is used in URLs to indicate that the 


directory it immediately precedes is a personal 
directory. 
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year as the year of receipt at the email address. The 
date will be contained within the header — when this 
does not occur the month of receipt can be utilized in 
the reference. Many listservers issue material more 
than once a month but used in combination with the 
author/title will provide a unique identifier. Some 
listservers and newsgroups are quite closely linked — 
knowing these combinations will assist the researcher 
in providing full bibliographic material. 


Email 
Email is the personal communication of the Internet. 
Referencing to email should be undertaken with the 
same judiciousness that is used with all personal 
communication. Personal communication is not ac- 
knowledged in the bibliography of research. In-line 
references simply acknowledge the interlocutor and 
the date with an annotation. Email provides similar 
information, as an example: 

(Bloodaxe, Eric 1995, email, 24th July) 
or if the person's name is unclear the section of the 
email address in front of the (2) symbol could be used 
e.g. E.Bloodaxe(ghum.gu.edu.au becomes: 

(E.Bloodaxe 1995, email, 24th July). 

It should be noted however, that some institu- 
tions and commercial service providers prefer to use 
a numeric system when allocating personal email 
identities. A minimal email communication con- 
ducted with someone connected to one of these sites 
verges upon incomprehensibility: 

(S450062 1995, email 24 July) 


Conclusion 
The methods outlined here have a heavy reliance 
upon the phenomena of the World Wide Web and its 
method of accessing Internet resources. This pro- 
vides backward compatibility with earlier resources 
as well as a standard on which to base future develop- 
ments (how do you reference the ‘writing on the 
wall' of a virtually modelled room?). 

Currently, papers and articles published on the 
Internet are not recognized as having a legitimate 
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fileformat://computer.type-of-system.country- 
code/file-directory(s)/file-name 
Thus 
http://www.gu.edu.au/gwis/hub/hub.home.html 
refers to 
e a hypertext document 
e based at the Griffith University server (www.gu) 
e which is an educational institution (edu) 
e physically located in Australia (au) 
e the file itself is nested within two directories 
(/gwis/hub) 
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Introduction 
The methods used to reference material gained from 
the Internet should follow existing referencing styles. 
The conventions established by the World Wide 
Web (WWW) to access a variety of file formats 
provide a concise system of incorporating Internet- 
based resources into referenced works. This system 
is based upon the Uniform Resource Locator (URL). 
In graphical WWW browsers activating the option 
Show Locations reveal the URL of each document 
being accessed. The generic format of a URL is: 
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this material may be more easily undertakea by 
contacting the author (via: email?) .of the ar-icle/ 
research, assuming that he or she still has a copy of 
the material. 


Listservers 

The listserver posts to each subscriber's email ad- 
dress a copy of an electronic journal every time e new 
issue is available. The periodical nature of this mate- 
rial allows researchers to use conventions similar to 
those for WWW electronic journals. The majo- dif- 
ference is the replacement of the URL with the 
subscription email address of the listserver. 


Email l ۱ 
Email is the personal communication of the Internet. 
Referencing to email should be undertaken with the 
same judiciousness that is. used with all personal 
commmunication. Personal communication iz not 
acknowledged in the bibliography of researck.. In- 
line references simply acknowledge the interlozutor 
and the date with an annotation. If the person's mame 
is unclear the section of the email address in frcnt of 
the @ symbol could be used. As an example 
E. Bloodaxe@hum.gu.edu.au becomes 

(E.Bloodaxe 1995, email, 24 July) 


Page numbering 
Page numbering is not usually available in electzonic 
documents. A solution may be to establish a style 
sheet for the printing of Internet documents. A rreset 
top, bottom, left and right margin coupled with a 
defined font and point size presented as a publicly 
available style guide would allow researchers <o set 
their printers to these parameters with the uader- 
standing that the actual text has not altered im any 
way. The resultant pages could be referenced as 
(Collins 1995:[3]) with the square brackets indicat- 
ing the variable nature of the numbering. 
AS a suggestion: 

Top Margin — 2.54cm (1 inch) 

Left Margin — 3.17cm (1.25 inches) 

Right Margin — 2.54cm (1 inch) 

Bottom Margin — 2.54cm (1 inch) 

Font 10 point Times New Roman (True ype) 


Normal 
Spacing — single line spacing with no user 
defined page breaks 


Examples of Internet resources as references 
The proposed referencing system focuses upon the 
Uniform Resoruce Locator (URL) primarily uszd by 
the World Wide Web (WWW) and widely applicable 
to a variety of Internet resources. 


References in the bibliography 

A World Wide Web Document 
Brenner, Anita 1995, The Murder Trial: Genre 
Or Event-Scene?, C-Theory, http://english- 
server.hss.cmu.edu/ctheory/e-murder trial html 
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e and is identified by the name hub.home.html 
e theend HTML also indicates a hypertext docu- 
ment format. 
The file name is case sensitive thus HUB.home. 
HTML does not point to the same file as above. 
Researchers providing a URL for Internet-based 
material should be aware of this when referencing 
files. 

Utilizing the full URL provides a level of redun- 
dancy in the bibliogrphic entry which allows the 
cross-checking of references. This method also ena- 
bles those sites with no apparent publisher still to be 
referenced. 

As a slight deviance from convention the final 
full stop in the bibliographic entry should be omitted 
to avoid URL addressing confusion. 


World Wide Web (WWW) pages 

WWW pages can be referenced in the same manner 
as a conventional monograph. The author, title and 
date of the page can be extracted and used 
conventionally. The publisher information is re- 
placed by the insititution of group who maintain the 
computer server upon which the WWW pages re- 
side. The place of publication is replaced by the 
URL of the WWW page. In some cases the WWW 
pages are associated with an electronic journal. 
These pages can only be referenced as journals if 
more complete volume and number or date infor- 
mation is available. 


File Transfer Protocol (FTP) 

File Transfer Protocol (FTP) 1s generally used to 
download software or text from a remote site to the 
user. If the user 15 accessing FTP through a WWW 
browser the text 1s displayed ‘raw’. This type of 
file can normally be attributed to individual people 
with all the usual referencing details. FTP is the 
earliest type of Internet publishing and, when it 
was (and occasionally still is) used, the material is 
often a digitized version of conventionally pub- 
lished material. If the material cannot be accessed 
in the dead tree edition the URL is, once again, 
recognizable: 

ftp://ftp.utuc.edu/pub/reference/grammar.txt 


Usenet news 

The distributed nature of the Usenet news prevents 
the URL from being used within a bibliographic 
entry as it produces totally meaningless URLs for 
later reference. 

Usenet news lends itself towards a more periodi- 
cal orientated style of referencing. There is usually 
some form of authorship. News items usually have a 
header which approximates a title. The newsgrooup 
itself takes the role of the journal. Dates are available 
which can be used in place of the volume and number 
details of conventional journals. 

The major problem with references to the Usenet 
- is the temporary nature of the postings. Accessing 
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A Usenet news article 
Graham, Adrian 1995, 'Fishing in Mauritius', 
alt.fishing, 29th July. 


An emailed Listserver document (include an indi- 
vidual author if possible, including the journal name 
after the article title) 
Postmodern Culture 1995, *Call for Peer Re- 
viewers’, 20th July, pmc-listserv.ncsu.edu 


Referencing within the text 


World Wide Web, Usenet news, FTP document and 
electronic journals 
(Foucault 1995) 


Email (is personal correspondence and not included 
in a bibliography of references cited) 
(Yano, Masaki 1995, email, 26th July) 
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A World Wide Web Electronic Journal (where both 
the title of the journal and volume (or date) informa- 
tion are available) 
Foucault, Michel 1995, ‘Madness, the Absence 
of Work’, excerpts, tr. P. Stasny & D. Stengel, 
Critical Inquiry, vol.21, no.2, Winter, http:// 
www.uchicago.edu:80/u.scholarly/CritIng/ 
v2 1n2.foucault.html 


A World Wide Web Document with minimal details 
available (e.g. no author, journal, volume or institu- 
tion information) - 
Humanities HUB ~1995, http://www.gu.edu.au/ 
gwis/hub/hub.home.html 


An FTP document 
Gaffin, Adam 1994, EFF’s Guide to the 
Internet v2.3, Electronic Frontier Foundation, ftp:/ 
/nysernet.org/pub/resources/guides/big dummy.txt 
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Batt's surveys of information technology in UK public librar- 
ies 64, 65 


Bibliographic checking (effects of information technol- 
ogy) 204 
Bibliographic references (in Internet articles) 248 
Binding of library materials 728 
Bit-mapped image systems 13, 22 
BLAISE-LINE online service 232 
BLCMP library service 3/ 
Book publishing 163-170, 197-201 
Books in Print 231 
Bookseller 232 
Boolean logic 15, 163, 164, 177 
British National Bibliography 231, 232 
British National Corpus Initiative 7 
British Telecom Information Resource Centre 790-193 
Browsers (for World Wide Web) 24, 37 
Bulletin board systems 47, 43, 45, 196 
— as a source of marketing information 178 
Business forecasting 175-178 
Business information 119-126 
— evaluation of services 725-126 
— funders 2 
- role of alliances and partnerships 23 


- target audiences 3 
— users 2 


Cable TV as a means for electronic publishing 166, 241 
CAFE card 242 
Cambridge Cable cableservice 1 


Campus-wide information servers 28 
— see also Local area networks 


Candide project (for machine translation) 215 

Carrefours (Rural Information Centres, in Scotland) 35, 37 
Cataloguing (effects of information technology) 204 
Cathode Ray Tube monitors 141, 164 


CATALYST controlled language system (for machine transla- 
tion) 218 


CD-I technology 164, 169 
CD-recordable technology 166-167 


CD-ROM in public libraries 63-77 
— advantages over other information sources 65 
— benefits 68 
— ease of use 64 
— evaluation of titles 67 
— impact on services 67-68 
— impact on staff 7 
— literature about 64 
— monitoring of service 68 
— problems 68, 69 
— promotion and training 68 
— selection of titles 67 
— user interfaces 64 
— user reactions 68-70 


CD-ROMs 
— end user interfaces 779-184 
— use in publishing /64, 165, 198, 226, 229-232 
— promotion of 165 
— security considerations 166-167 
— standards for 165 
— see also CD-ROM in public libraries 
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EBMT see Example based machine translation systems 
EDINFO (Edinburgh University CWIS) 31 


EDR project see Electronic Dictionary Research project (for 
machine translation) 


Edutainment titles in electronic publishing /67 
Electricity supply industry (use ofartificialintelligence) /53- 
160 


Electronic books 13-22, 16-17, 44, 139 
- sez also Electronic libraries; Electronic publishing; 
Talking books 


Electronic cash tokens in electronic payment systems /47- 
148 


Electronic classroom (concept) 201, 202 


Electronic communication 195-202 
— seg also Computer networks; Internet Local area net- 
works; Telecommunications 


Electronic conferencing systems 4 


Electronic Dictionary Research project (for machine transla- 
tion) 215 

Electronic display technology 140 
— readability 741 


Electranic journals 246, 247, 251 


Electronic libraries 13-22, 195-202 
— effects on staff 203-208 


Electronic Library SGML Applications project 22 
Electronic library systems 13-2 


Electronic mail 36, 41, 43, 44, 45-46, 99, 141, 188, 196, 198, ` 
201, 206, 237, 238, 240, 246, 250 
—referencing 250, 251 


Electronic newsletters 798 


Electronic paper (concept) 739-142 
- aesthetic considerations 2 
— defined 139 
- manipulation of 742 
— structured representations 7 


Electronic payment methods (over Internet) 745-152, 241- 
243 
— anonymity 6 
— authenticity 747, 239 
— confidentiality 7 
— cryptographic techniques 147-148 
— double-spending prevention ۵ 
— electronic cash tokens 147-148 
— encryption 147, 239 
— extrinsic security mechanisms 6 
— integrity 747 
— legal and political considerations /5/ 
— privacy considerations 6 
— security considerations 6 
— trusted third parties /45-146 


Electronic publishing 44, 139-142, 163-170, 250 
— benefits 739 
— database creation 1 
— delivery methods 165-166 
— future prospects 226-227 
— hardware requirements 4 
— online delivery of electronic publications /66, 231 
— overhead costs 232 
— marketing 232 
— pricing considerations 229-232 
— Software requirements 764-165 
— use of World Wide Web 166 
— see also Electronic books; Electronic journals 


Electronic seminars (concept) /98, 202 
Electronic sources of marketing information /78 


ELINOR electronic library 13-22 
— document reading 3-6 
— document retrieval 4-5 
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Corpora (for translation) 54, 57, 74, 83-92, 106-107, 211, 
215-219 
— testing 95-97 


Corpus-based translation drafting tools 83-92 
Correlation (statistical technique) 79 


COTECH (restricted technical English vocabulary for te- 
lephony) 74, 78, 80, 81 


Credit card transactions on Internet 746, 242 
- see also Electronic payment methods (over Internet) 


CRITTER stock market reports machine translation sys- 
tem 218 


Cross-cultural communication 4/ 


Cross-group working (between library/computer staff) 206- 
207 : 


CRT screens. 141, 164 

Cryptographic techniques in electronic payments 147-148 
Custom information services 226 

CyberCash electronic payment system 151 

Cyberspace 41-46 


Data analysis (in research) 7-8 
Data compression 76 
Data Discman (Sony multimedia system) 8 


Data protection legislation (applied to Internet) 238, 239, 
240 


Data security (on World Wide Web) 32 
Database management systems /5 


Database systems 15, 43 
— database of databases 5 
— see also Online databases 


DBMS see Database management systems 

Decision making 3, 773 

Decision support systems 4 

Defamation law as applied to Internet 238 

DELIS linguistic research and engineering programme 57 
Delphi Internet service 44 

Design of research 4, 6 

Developmental Medicine & Child Neurology (journal) 134 


Diagnostic evaluation of natural language processing sys- 
tems 95 


Dialog online service 232 

DigiCash electronic payment system 751, 242 
Digital representations see Electronic publishing 
DIP see Document image processing systems 
Discontinuous change 195-202 

Distance learning 44, 226-227 

DLT machine translation project 2/2 

Document delivery services 5 

Document image processing systems 13, 4 


` Documentary study (in research) 7 


Domain names on Internet 237-238 
Dorling Kindersley as an electronic publisher 7 


Double-spending prevention in electronic payment sys- 
tems 148 


DRAFTER project (for technical drafts in English/French) 57 


DVB-OSI (proposed link between German regional sys- 
tems) 4 


Dynamic security assessment (in electricity supply indus- 


try) 154 


EAGLES see Expert Advisory Group for Language Engi- 
neering Standards ۱ i 
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GIF (graphical format) 37 
GIST linguistic research and engineering programme 57 
Global information network (concept) 224 


Gopher (Internet tool) 23-24 
— referencing in 248-249 


Grammar checkers/correction in machine translation 73-82, 
96, 97, 213, 214 


Graphical user interfaces 74, 90, 156, 164, 224, 246 
— see also Human-computer interfaces (for information 
systems) 


GUI see Graphical user interfaces 


Hand-held electronic information browsers 142 
Help desk services 103 
Hermeneutic approach to design 6 


Highlights (National Children's Bureau research sum- 
mary) 132 


Historical research 5 
Holistic research 8 
Holography (for optical storage) /65 


Home market penetration by information technology 224 
— electronic publishing considerations 167 


HTML see HyperText Markup Language (on World Wide 
Web) 


HTTP see HyperText Transfer Protocol (on World Wide 
Web) 


Human-computer interfaces (for information systems) 3, 
114 


Hyperonyms (in machine translation) 84 
Hypertext document browsers 13, 24 


HyperText Markup Language (on World Wide Web) 24, 27, 
28, 31, 245-248, 251 


Hypertext systems 23-32, 163, 164, 201, 245-252 


HyperText Transfer Protocol (on World Wide Web) 24, 28, 
31, 148 


Hyponyms (in machine translation) 4 
Hypothetico-deductive research 5-6 


Image databases 73-4 

Image maps (in World Wide Web) 7 
Image systems (bit-mapped) /3, 22 
IMPEL project 203-208 

Inductive research 6 

Inference engines (in expert systems) /54 
Informal information networks 111, 124 
Informatic systems 111-116 


Information 
— as a marketable resource 1 
— as a strategic resource 709-111, 119-126 
— paying for open network information 145-152 
— perceived value 229 
— presentation methods 7 
— use in organizations /09-116 


Information audits 7/0, 114, 116 
Information awareness 3, 71, 109-116, 193 
Information capitalism (concept) 709 
Information economy 720-121 

Information gateways 193 

Information kiosks 43, 241 

Information management 709-116, 119-126 
Information mastery (concept) 709-116 
Information overload (concept) 4 
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— pilot system architecture /4 
— usability studies 16-17 
~ user interface 4 


ELS see Electronic library systems 
ELSA see Electronic Library SGML Applications project 


ELU engine see Environnment Linguistique d'Unification 
machine translation engine 


Encryption 
~ in electronic payment systems 747, 239 
— in electronic publishing /67 


Encyclopedia Britannica as an electronic publication /66 


End user searching 205 
— of CD-ROM 63, 64, 65, 67-71, 179 


Energy Management Systems 6 


Environnment Linguistique d'Unification, machine transla- 
tion engine 274, 218 


Euramis interface for Systran 104, 105 


Eurodicautom databank 45, 104 
— bridge to Systran 705-106 


Eurolang Optimizer (translation memory system) -54, 72 
Europagate Z39.50 implementation 4 ۱ 
Europe Online online service 24] 

European Centre for Particle Physics see CERN 
European Documentation Centres (in Scotland) 35-36, 38-39 
European Information Association 35, 39 

European Information Centres (in Scotland) 35, 36-37 
European information in Scotland 35-40 

European Reference Centres (in Scotland) 37 

Eurotra machine translation project 2 

Evaluation of information technology systems 204 


Evaluation of natural language processing systems 95-97, 
106-107 


Example based machine translation systems 83-85, 216, 218 
Expanded forms of terms (translation considerations) 50 


Expert Advisory Group for Language Engineering Stand- 
ards 57, 95 


Expert systems 153, 154-158 
Extrinsic security mechanisms (for electronic payments) 146 


Fax communications 41, 42: 44, 124, 139, 188, 222, 239 


Fibre-optic networks 724, 153, 188, 222-227 
— use in electronic publishing 766 
— see also Computer networks 


File Transfer Protocol (on Internet) 24 
— referencing 249, 25] 


Firewalls (as network security devices) 239 
First Bank of Internet electronic payment system 151 
First Data credit card handling agency 148 
First Virtual electronic payment system 7 
Forecasting in technology research 175-178 
Formal information systems 1 

Forms facility (on World Wide Web) 28, 37 
Fraud (via Internet) 238, 239, 242 

Free text retrieval 75 

Frequency domain optical storage 5 

FTP see File Transfer Protocol (on Internet) 
Full text information retrieval 739 

Fuzzy logic 153, 154 


Fuzzy match mechanisms in translation memory systems 67- 
89, 107 


Generation in machine translation 74-78, 96 
Genetic algorithms 3 
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Journal of Adolescence 732, 134 
Journal of Child Psychology and Psychiatry 4 
Journal of Youth and Adolescence /34 
Journal publishing 797-198 
— on Internet 245-252 


— pricing factors 8 
س‎ see also Electronic journals 


Just-in-time information storage and retrieval 4 


KDD Teleserve translation service 44 

KeyWord in Context tools (for translation) 54 

Knowledge 3, 109 
— as a business resource 719-126 
— communication of 779-120, 122-123, 124, 195-202 
— packaging of 125 

Knowledge bases (in expert systems) 4 

Knowledge economy (concept) 109 

Knowledge-based approaches to machine translation 2/1, 
212, 217, 218 

KWIC tools (for translation) 54 


Language factors in telecommunications 4/-46 

Laptop computers 7 

LCD displays 140-141, 164, 226 

Learned societies 197-198 

Lemmatization of wordforms in translation 86-87, 91-92 

Lexical selection (in sublanguages — translation considera- 
tions) 52-57 

Lexicons and lexical analysis (in machine translation) 47, 73, 
74-81, 211-219 


Librarians 
— cross-group working with computer staff 206-207 
— job satisfaction 204-205 i 
— perceived effects of information technology 203-208 
— role in post-information revolution libraries 194, 201 
— technical skills 194, 205-206 


Libraries 

— effects of information revolution 0 

— needs of individual users 190 
Library & Information Statistics Tables (LISU) 64 
Library and Information Science Abstracts 23 
Library Association Record 65 
Library use of World Wide Web 27 

— in academic libraries 27-28 


Linguistic Research and Engineering (LRE) program 83-92, 
95 


Listservers (referencing) 249-250, 251 

Literature reviews (as part of research) 3-4 

LMT machine translation system 2 

Load forecasting (in electricity supply industry) 7:74 


Local area networks /24, 241 
— intrusion risks 238-239 
—see also Computer networks 


Machine intelligence 789 


Machine translation systems 46, 73-82, 83-92, 95-97, 99-107 
— controlled language checkers 97, 217-218 
— corpus based systems 275-219 
— effects on role of human translators 99-100 
— evaluation 9 
— example based systems 83-85, 216, 218 
— knowledge-based approaches 217, 212, 217, 218 
— methodology 211-219 
— multilingual text generation 2/7 
— post-editing services 100-103, 212, 217 
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Information providers on Internet 
~ Service agreements 239, 241 


Information revolution (concept) 187 
Information society 196, 222-227 


Information superhighways 41, 42, 188 
— use in electronic publishing 6 
~ see also Internet; World Wide Web 


Information superstructures 188-194 


Information technology 41, 46, 64, 113, 115, 116, 123-125, 

153 

~ as part of the information revolution 187-194 

— effects on bibliographic checking 204 

— effects on office organization 224 

— future prospects 221-227 

— Strategies in UK university libraries 203-208 

— see also CD-ROM; Computer networks; Electronic 

publishing; Internet; Telecommunications 
Information value chains 7/0 
Innovation 7/9, 196, 197 

~ see also Change 
Integrated Services Digital Network 166, 6 
Integrity in electronic payment systems 7 
Integrity of data (on World Wide Web) 32 
Intellectual property see Copyright 
Intelligent agents (in information technology) /89 
Inter-organizational information systems 123-4 
Interception of Internet transactions 239 
Interleaf text processing system 8/7 
International Value Added Network Services 46 
Internet 23-32, 41, 42, 45-46, 124, 145-152, 183, 188, 189, 

194, 196, 198, 201, 205, 226, 229, 231 

— authenticity in electronic payment systems 147, 239 

— bibliographic references 248 

— copyright considerations 236-237, 239 

~ credit card transactions 242 

— data protection regulation 238, 239, 240 

— defamation law 238, 239 

— domain names 237-238 

~ electronic payment methods 145-152, 241-243 

— File Transfer Protocol 24, 249, 251 

—fraud 238, 239, 242 

~ gopher (Internet tool) 23-24, 248-249 

— interception of transactions 239 

-— intrusion considerations 238-239 

— legal considerations 235-240 

— listservers (referencing) 249-250, 251 

— pornography (legal aspects) 238, 239 

~ search engines 246 

— secure protocols 148 

- security 238-239 

— service agreements with Ips 239, 241 

- use for electronic payments 145-152, 241-243 

— use for electronic publishing 6 

~ use of trademarks 237-238 

— see also World Wide Web 
Interpersonal communications systems 6 
Interviews (as part of research) 4, 7 
Intrinsic security mechanisms (for electronic payments) 746 
IRIS (Irish current awareness service) 3 
ISDN see Integrated Services Digital Network 
ISINDEX tag (in HTML) 26, 31 


IVANS see International Value Added Network Services 


JANET network 65, 203 

JANUS speech translation system 2/8 

Job satisfaction (effects of information technology) 204-205 
Job security (effects of information technology) 207 
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Online databases 68, 145, 196, 198, 201 
— for marketing information 7 
— see also Database systems 


Online delivery of electronic publications 6 

'— pricing 166, 231 
Online document delivery services 145 
Online payment methods (over Internet) 745-152, 241-243 
OPACSs 15, 17, 22, 27-29, 31, 63, 183, 196 
Open Market electronic payment system 7 
Open network payment systems 145-152 
Operational research 9 
Optical character recognition 75 
Optical fibre networks see Fibre-optic networks 
Optimizer machine translation workstation 212 
Order-based systems (for electronic payments) 146, 147 
Organizational research 5 


Organizational use of information 709-116, 119 
— role of organizational structure 721-122 


PAHO machine translation system 2/8 

Paired t-tests (statistical technique) 49 

PANDA Project (for newspapers on CD-ROM) 67 
Pangloss machine translation project 2/2, 218 

Paper (use in work) 739-2 

Parsers in machine translation 97 

Pattern matching (in translation) 54-57 

PC-VAN (Japanese network) 44, 46, 219 

PCMCIA cards (for optical storage) 765 

PDAs see Personal Digital Assistants 

PECOF feedback mechanism (for machine translation) 2/7 
Perfect match mechanisms in translation memory systems 7 
Periodicals as a source of marketing information 178 
Personal Digital Assistants 24] 

Photocopying 8 

Pixlook bit-mapped image system 3 

PixTex/EFS document image processing system 14, /5 


PLNLP system see Programming Language for Natural 
Language Processing machine translation system 


Pornography on Internet (legal considerations) 238, 239 
Pos tags (in machine translation) 4 

Positivistic approach to design 6 

Post-editing services for machine translation systems 100- 


103 
Preservation (of library materials) 727-129 
— role of users 728 a 


Pretest-posttest designs (in research) 6 

Pretty Good Privacy digital signature implementation 7 
Printing (development of) 187 

Privacy-Enhanced Mail standard (Internet) 7 

Proactive information sourcing 190, 193 


Probabilistic Term Weighting (statistical technique in infor- 
mation retrieval) 764 


Profiles of needs of library users 0 


Programming Language for Natural Language Processing 
machine translation system 4 


Progress evaluation of natural language processing systems 95. 


Project ION (interlending systems link between the UK, 
Holland and France) 4 


Proposals (for research projects) 4-5 
Public Information Relay (EC initiative) 35, 38, 39 
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— rule-based systems 211-215, 218 
— sublanguages 52-57, 91, 217, 218, 219 
— text generation 277 


Management use of information 709-116 

MAPTRAN machine translation system 7 

Market research reports 777 

Marketing information for technology research 176-178 
Mastercard credit card company 243 

MCI Mail online service 46 

Mediation in translation 47-48 


Message understanding systems (in natural language process- 
ing) 96 
Metal (machine translation system) 73-81, 211, 212 
— analysis phase 8 
— generation of output 78-30 
— transfer phase 78 


Meteo meteorological machine translation system 2/8 
Microsoft as an electronic publisher 167, 168 

Microsoft Network online service 24] 

Minitel French network 219 

Mondex electronic payment system (NatWest) 151, 242, 243 
Monographs (electronic publication of) /98 

Morphological paradigms (in machine translation) 84 
Mosaic (World Wide Web browser) 24, 31 


Most Specific Common Abstraction (MSCA) translation 
tool 84-85 


MPEG video data compression standard 165, 166 
MULTEXT linguistic research and engineering programme 57 
Multidimensionality in concept systems (in translation) 49 
Multilingual text generation (in machine translation) 7 


Multimedia systems 24, 42-43, 63, 114, 164, 166, 167, 189, 
... 196, 198, 222 
— see also Electronic publishing; World Wide Web 


Mutual Information (lexical tool) 58 


Natural language processing systems 47, 95-97, 214, 217 
— evaluation 95-97, 106-107 
— message understanding systems 96 


Natural language retrieval 15 

Naturalistic research 6-8 

NEC PIVOT machine translation system 7 
NetBill electronic payment system 757 
Netcheque electronic payment system 7 
NetChex electronic payment system 751 


Neural networks 153, 154, 158-159 
— indexes 15 


Newsletters as a source of marketing information 778 
Newspapers on CD-ROM 66-67, 68 

NIFTY-Serve (Japanese network) 44, 46, 219 
Non-repudiation in electronic payment systems 7 
Nordic SR-Net (union catalogue linking system) 4 
Normalization of corpora (for machine translation) 85 
Notebook computers 0 


OCR see Optical character recognition 
Off-site access to libraries 193 
Office of the European Commission — Edinburgh 35 


Office organization (effects of advances in information tech- 
nology) 224 


Offline delivery of electronic publications / 65-166 
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Socker Z39.50 implementation 4 
Source origin/type of terms (translation considerations) 49, 
0 


Speech (development of) 7 

Speech and Language Technology (SALT) programme 57 

SPSS/PC+ statistical package 7 

SSL see Secure Sockets Layer 

Staff requirements for research 5 

Standard Generalised Markup Language 24, 79, 80, 81, 125 

STAR workstation (for machine translation) 7 

Statistical analysis (of data) 9, 17-19 

Style correction in machine translation 73-82 

Sublanguages (machine translation considerations) 47, 52- 
57, 91, 217, 218, 219 

Subscriptions as a charging mechanism (for online serv- 
ices) 745 

SuperBook hypertext document browser 3 

Surveys (in research) 9 

Synonyms (translation considerations) 49, 84 

Syntactic checking (in machine translation) 73 

Syntactic transfer (in machine translation) 211 


Systran (EC multilingual machine translation system) 99- 
107, 218 
- as a drafting tool 703 
~ as an information tool 704 
~ as a terminology pre-processing tool /03, 104 
— bridge to CELEX 4 
~ bridge to Eurodicautom /05-106 
— Euramis interface 774 
— future needs 7 ۰ 
- informatics access procedures /04 
- language policies 6 
— linguistic development 706-107 
- user support 3 
- users 103-104 


t-score lexical dissimilarity test 8 
Talking books 226 
Target audiences for business information services /23 
TCP/IP protocol (on Internet) 14, 148 
Tele-learning 4/, 44 
Telecommunications 41-46, 113, 119, 124, 188, 195, 236 
~ infrastructure 222 
- language factors 41-46 
— role in electronic publishing /68 
— standards 195, 6 
— see also Computer networks; Information technology; 
Internet 
Teleconferencing 42-43 
Telepresence (infomatics) 226 
Teleshopping 47, 43, 45 
Teletranslation services 44-45, 46 
Telnet protocol (on Internet) 27, 28, 31 
Term banks (for translation) 49, 54, 104, 219 
Term recognition tools (for translation) 58-59 


Terminology 
~ in research 3 
~ in translation services 47-60, 104, 106 


Test corpora in natural language processing 95-97 


Test Suites for Natural تن ہے سو‎ (TSNLP) 
project 95-97 na. 


Text generation (in machine trinstion) 27 
Text matching in corpora 86: 89, 99 
Text preprocessing in corpora 83-86 7 Ul 


| libraries 
ffects of information revolution 790-194 
-use of CD-ROM 63-72 


Publication by academics 797 
— economic aspects 7 


Publishing industry in the UK 163-170, 198-201 
—see also Electronic publishing 


Qualitative research 5, 6-8, 203 
Quantitative research 5-6, 203 
Questions (in research) 4, 9, 16 


Radio communications 224 

Reduced forms of terms (translation considerations) 50 
Reference books 8 

Remote access to libraries 793 

Remote learning 226-227 


Research by academics /96-197 
— assessment 7 
— publication of results 797-198 


Research methods 3-/0 
— conscious partiality 8 
— design of study 4, 9 
— role of theory 4 
~ reliability 9 
— staff requirements 5 
— Software packages 9 
— techniques 8-9 


Resources (for research) 5 

Rosetta machine translation project 2 

Rule-based machine translation systems 211-215, 218 
Rural Information Centres (in Scotland) 35, 37 

Rural Society database 37 


Sampling and significance testing (in research) 9 
Satellite communications 224 | 
Scanning of text 99 

Scholarly communication 795-202 

Science Citation Index 23 

Scope of terms (translation considerations) 49 


Search engines 
— for databases 798 
— on Internet 24, 246 


Searchable directories (on World Wide Web) 7 


SECC see Simplified English Grammar and Style Checker/ 
Corrector project 


Secure HyperText Transfer Protocol 148 
Secure Internet protocols 8 
Secure Sockets Layer (on Internet) ۶۸ 
Secureweb (World Wide Web security toolkit) 2 
Security in electronic payment systems 45 
Sentence alignment in translation 85-86, 91 
SGML see Standard Generalised Markup Language 
SHTTP see Secure HyperText Transfer Protocol 
Simplified English 73-3] 
—see also Simplified English Grammar and Style Checker/ 
Corrector project 
Simplified English Grammar and Style Checker/Corrector 
project 73-2 
— evaluation of output 0 
- user interfaces 80-8] 
Smart cards 242 


Social Science Citation Index 23 
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Universities 796-197 
University of Cape Coast Library 127 


URI see Universal Resource Identifiers (on World Wide 
Web) l 


; . ‘+ 
URL see Universal Resource Locators (on World Wide Web) 
Usage of terms (translation considerations) 49, 50 


Usenet news 238 at 
~ referencing 249, 251 - 


Utopia (proposed Microsoft graphical user interface) 164 


Value of information (perceived) 229 

Variant forms of terms (translation considerations) 49-7 
Verbmobil machine translation project 2/8 

Video on demand services 168, 225, 241 

Video teleconferencing 42-43 

Videotron online service 241, 242 

Virtual action (in computer procedures) 236, 237 

Virtual reality 224, 227 

Visa credit card company 243 

Visa digital purse project (electronic payment system) 7 
Voice input systems 224 


WAIS (Internet tool) 23-24 

Weeding of library materials 129 
Whittaker's Books in Print 232 

Wide Area Information Servers see WAIS 
WinSPIRS interface 179 

Wordforms (in translation) 48, 58-59 
WORDNET teletranslation service 44-45 — 


World Wide Web 23-32, 145-152, 178, 189, 190, 198, 201, 
235-240, 245-252 
— advertising 240 
— commercial applications 6 
— library use 27-28 
~ security capabilities 148 
~ use in electronic publishing 6 
— user friendliness 24, 246 
— see also Internet 


Writing (development of) 7 


X-400 mail servers 46, 239 
X.509 network public key standard 7 


Z39.50 protocol 183-184 
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Text type analysis (in translation) 59 ۱ 
Text-based navigation systems (in information bases) 293 
Textbooks (electronic publication of) 201 | 

Theory (role in research) 4 

Time-stamping in electronic payment systems 147 

TM/2 (IBM translation memory system) 54 


Token-based systems (for electronic payments) /46, 147, . 


148, 151 
TOPIC software 190 
Tovna machine translation system 7 
Trademarks (use on Internet) 237-238, 240 
TRADEX military machine translation system 218 
TRADOS workstation (for machine translation) 2/7 
Transaction-based pricing (on online services) 5 
Transduction rules (in machine translation) 2/3 
Transfer lexicons (in machine translation) 73-74 
Translate! translation service 44 


Translation 41-46, 47-60, 73-82, 83-92 
— in teleconferences 42-43 
— lemmatization of wordforms 86-87, 91-92 
- memory tools 54-57, 83 
— reduced forms of terms 50 
— sentence alignment 85-86, 27 
— source origin/type of terms 49, 50 
~ term banks 54, 219 
~~ term recognition tools 58-59 
— usage of terms 49, 50 
~ variant forms of terms 49 
— see also Machine translation 
Translation memory systems 54-57, 83-92 
Translator's Workbench (translation memory system) 54 
Translearn corpus-based translation drafting tool 57, 83-92 
- system architecture 89-90 
TRANSTERM linguistic research and engineering pro- 
gramme 7 


Trusted third parties (for electronic payments) 145-146 


ULTRA machine translation system 272 
UnCover 23 
Union List of CD-ROMs in London Libraries 64, 65 
UNITRAN machine translation system 212, 214, 215 
_ Universal Resource Identifiers (on World Wide Web) 24 


Universal Resource Locators (on World Wide Web) 24, 31, 
240, 245-252 
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